I am trying to create dynamic dag but seems to be failing at the minute. I came across creating the DAG object in two different:
from airflow.models import DAG https://airflow.apache.org/concepts.html#latest-run-only
from airflow import DAG https://airflow.apache.org/tutorial.html
This really confused me because within the same documentation there are two ways of instantiating DAG object.
Both are importing the same DAG class. Just an attribute of how python imports works.
When you do from airflow.models import DAG python is importing the models file and assigning the variable DAG to the DAG class defined in the models file.
When you do from airflow import DAG python is importing the variable DAG defined in init.py, which is in fact just from airflow.models import DAG.
A minimal version being:
models.py
class DAG():
pass
init.py
from airflow.models import DAG
dags/dag_file.py
# import __init__.py which imports models.py which contains DAG
from airflow import DAG
# or this which just imports models.py which contains DAG
from airflow.models import DAG
All that being said, if your dynamic DAG is failing, I doubt it's related to this import
Related
we want to read cli input pass to dag from UI during Dagtrigger in Dag.
i tried below code but its not working. here i am passing input as {""kpi":"ID123"}
and i want to print this ip value in my function get_data_from_bq
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.operators.python_operator import PythonOperator
from airflow import models
from airflow.models import Variable
from google.cloud import bigquery
from airflow.configuration import conf
LOCATION = Variable.get("HDM_PROJECT_LOCATION")
PROJECT_ID = Variable.get("HDM_PROJECT_ID")
client = bigquery.Client()
kpi='{{ kpi}}'
# default arguments
default_dag_args = {
'start_date':days_ago(0),
'retries': 0,
'project_id': PROJECT_ID
}
# Setting airflow environment varriable,getting hdm_batch_details data and updating it
def get_data_from_bq(**kwargs):
print("op is:")
print(kpi)
#Dag Defination
with models.DAG(
'00_test_sql1',
schedule_interval=None,
default_args=default_dag_args) as dag:
v_run_sql_01 = PythonOperator(
task_id='Run_SQL',
provide_context=True,
python_callable=get_data_from_bq,
location=LOCATION,
use_legacy_sql=False)
v_run_sql_01
Note: I don't want to use any operator to read data passed from cli
Note: I don't want to use any operator to read data passed from cli
This is impossible. Dag Run is only created when there are tasks to run.
You should understand that :
DAG + its top level code - builds DAG structure consisting of Tasks
DAG Run -> is single instance of DAG run which contains Task Instances to be executed. Dag Run simply consists of task instancess that belong to the DAG run with the given "dag run".
The configuration that you pass is "dag_run.conf" not "dag.conf" - which meanss that it is only specified for the DagRun, which is valid only for all Task Instances that belong to it.
Only Task Instances have access to dag_run.conf
I am new to airflow and need some direction on this one...
I'm creating my first dag that uses a subdag and importing the subdag operator
`from airflow.operators.subdag import SubDagOperator`
however I keep getting the flowing error
"Broken DAG: [/usr/local/airflow/dags/POC_Main_DAG.py] No module named 'airflow.operators.subdag'"
I also tried importing the dummy operator ang got the same error.
on the other hand the below operators seem to be imported as expected.
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.mysql_operator import MySqlOperator
appreciate help on resolving this issue
thanks in advance!
What version of Airflow are you using?
If you are using Airflow 1.10.x, use the following:
from airflow.operators.subdag_operator import SubDagOperator
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
In Airflow >=2.0.0, use the following:
from airflow.operators.subdag import SubDagOperator
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
I am using Version : 1.10.4. i changed the code in the way you suggested and now it works.
thanks for the help!
Was attempting to add a new DAG to our Google Cloud Composer instance - we have 32+ DAGs currently - and doing the usual things in https://cloud.google.com/composer/docs/how-to/using/managing-dags doesn't appear to be having any effect - we can't see these DAGs in the webserver/UI and I don't see that they are necessarily being loaded. I do see them being copied to the appropriate bucket in the logs but nothing beyond that.
I even tried setting a dummy environment variable to kick off a full restart of the Composer instance but to no avail.
Finally I've put together an entirely stripped down DAG and attempted to add it. Here is the DAG:
from airflow import models
from airflow.contrib.operators import kubernetes_pod_operator
from airflow.operators.python_operator import BranchPythonOperator
from airflow.operators.dummy_operator import DummyOperator
dag = models.Dag(
dag_id="test-dag",
schedule_interval=None,
start_date=datetime(2020, 3, 9),
max_active_runs=1,
catchup=False,
)
task_test = DummyOperator(dag=dag, task_id="test-task")
Even this simple DAG isn't getting picked up so I'm wondering what I can try next. I looked through https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg in an effort to see if perhaps there was anything I might tweak in here in terms of DagBag loading time limits, etc. but nothing jumps off. Totally stumped here.
Your example is not picked up by my environment either. However, I've tried with the following format and was picked up without issues:
from airflow import DAG
from datetime import datetime
from airflow.contrib.operators import kubernetes_pod_operator
from airflow.operators.python_operator import BranchPythonOperator
from airflow.operators.dummy_operator import DummyOperator
with DAG(
"my-test-dag",
schedule_interval=None,
start_date=datetime(2020, 3, 9),
max_active_runs=1,
catchup=False) as dag:
task_test = DummyOperator(dag=dag, task_id="my-test-task")
We eventually figured out the issue:
dag = models.Dag(
should be
dag = models.DAG(
I have created a new DAG using the following code. It is calling a python script.
Code:
from __future__ import print_function
from builtins import range
import airflow
from airflow.operators.python_operator import PythonOperator
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator
args = {
'owner': 'admin'
}
dag = DAG(
dag_id='workflow_file_upload', default_args=args,
schedule_interval=None)
t1 = BashOperator(
task_id='testairflow',
bash_command='python /root/DataLake_Scripts/File_Upload_GCP.py',
dag=dag)
I have placed it in $airflowhome/dags folder.
after that I have run :
airflow scheduler
I am trying to see the DAG in WebUI however it is not visible there. There is no error coming.
I've met the same issue.
I figured out that the problem is in initial sqlite db. I suppose it's some feature of airflow 1.10.3
Anyway I solved the problem using postgresql backend.
These links will help you:
link
link
link
All instructions are suitable for python 3.
You'll see your DAG after execution of 'airflow webserver' and 'airflow scheduler' commands.
Also notice that you should call 'sudo service postgresql restart' exactly before 'airflow initdb' command.
We are using Apache 1.9.0. I have written a snowflake hook plugin. I have placed the hook in the $AIRFLOW_HOME/plugins directory.
$AIRFLOW_HOME
+--plugins
+--snowflake_hook2.py
snowflake_hook2.py
# This is the base class for a plugin
from airflow.plugins_manager import AirflowPlugin
# This is necessary to expose the plugin in the Web interface
from flask import Blueprint
from flask_admin import BaseView, expose
from flask_admin.base import MenuLink
# This is the base hook for connecting to a database
from airflow.hooks.dbapi_hook import DbApiHook
# This is the snowflake provided Connector
import snowflake.connector
# This is the default python logging package
import logging
class SnowflakeHook2(DbApiHook):
"""
Airflow Hook to communicate with Snowflake
This is implemented as a Plugin
"""
def __init__(self, connname_in='snowflake_default', db_in='default', wh_in='default', schema_in='default'):
logging.info('# Connecting to {0}'.format(connname_in))
self.conn_name_attr = 'snowflake_conn_id'
self.connname = connname_in
self.superconn = super().get_connection(self.connname) #gets the values from Airflow
{SNIP - Connection stuff that works}
self.cur = self.conn.cursor()
def query(self,q,params=None):
"""From jmoney's db_wrapper allows return of a full list of rows(tuples)"""
if params == None: #no Params, so no insertion
self.cur.execute(q)
else: #make the parameter substitution
self.cur.execute(q,params)
self.results = self.cur.fetchall()
self.rowcount = self.cur.rowcount
self.columnnames = [colspec[0] for colspec in self.cur.description]
return self.results
{SNIP - Other class functions}
class SnowflakePluginClass(AirflowPlugin):
name = "SnowflakePluginModule"
hooks = [SnowflakeHook2]
operators = []
So I went ahead and put some print statements in Airflows plugin_manager to try and get a better handle on what is happening. After restarting the webserver and running airflow list_dags, these lines were showing the "new module name" (and no errors
SnowflakePluginModule [<class '__home__ubuntu__airflow__plugins_snowflake_hook2.SnowflakeHook2'>]
hook_module - airflow.hooks.snowflakepluginmodule
INTEGRATING airflow.hooks.snowflakepluginmodule
snowflakepluginmodule <module 'airflow.hooks.snowflakepluginmodule'>
As this is consistent with what the documentation says, I should be fine using this in my DAG:
from airflow import DAG
from airflow.hooks.snowflakepluginmodule import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator
But the web throws this error
Broken DAG: [/home/ubuntu/airflow/dags/test_sf2.py] No module named 'airflow.hooks.snowflakepluginmodule'
So the question is, What am I doing wrong? Or have I uncovered a bug?
You need to import as below:
from airflow import DAG
from airflow.hooks import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator
OR
from airflow import DAG
from airflow.hooks.SnowflakePluginModule import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator
I don't think that airflow automatically goes through the folders in your plugins directory and runs everything underneath it. The way that I've set it up successfully is to have an __init__.py under the plugins directory which contains each plugin class. Have a look at the Astronomer plugins in Github, it provides some really good examples for how to set up your plugins.
In particular have a look at how they've set up the mysql plugin
https://github.com/airflow-plugins/mysql_plugin
Also someone has incorporated a snowflake hook in one of the later versions of airflow too which you might want to leverage:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/snowflake_hook.py