airflow webserver can run without problem.
airflow scheduler would get error message:
Cannot use more than 1 thread when using sqlite. Setting parallelism to 1
airflow.cfg:
sql_alchemy_conn = mysql+pymysql://root:mypassword#localhost:3306/airflow
Have you set $AIRFLOW_HOME wherever you run scheduler too?
Looks like the scheduler is not picking up the airflow.cfg file at all.
Related
I need to run dags in parallel but do not need significant scaling, so LocalExecutor can do the job just fine. I looked through the Airflow docs and first created a MySQL database:
CREATE DATABASE airflow_db CHARACTER SET utf8;
CREATE USER <user> IDENTIFIED BY <pass>;
GRANT ALL PRIVILEGES ON airflow_db.* TO <user>;
I then modify the following parameters in the airflow.cfg file:
executor = LocalExecutor
sql_alchemy_conn = mysql+mysqlconnector://<user>:<pass>#localhost:3306/airflow_db
When I run airflow db init, I run into the following error message:
AttributeError: 'MySQLConverter' object has no attribute '_dagruntype_to_mysql'
During handling of the above exception, another exception occurred:
TypeError: Python 'dagruntype' cannot be converted to a MySQL type
Please note that nothing else in the airflow.cfg file was altered and that using the default SequentialExecutor with sqlite lets everything run just fine. Also note that I am using Airflow version 2.2.0
I found the solution to my own question. Instead of using the mysqlconnector, I used the pymysql driver:
pip install PyMySQL
The airflow.cfg parameters can then be adjusted as follows:
sql_alchemy_conn = mysql+pymysql://<user>:<pass>#localhost:3306/airflow_db
All else can stay the same.
On Airflow 2 my dag is not showing on the UI, and I'm getting DAG Import Errors (...) for it.
The error message is insufficient for me to debug (it's a custom operator, with a lot of custom logic - so I don't want to get into details of the error itself).
On Airflow 1.X I could use cli:
airflow list_dags
to get more elaborated debug message, is there anything analogical on airflow 2 ?
I'm looking for a cli command/UI option that will provide me with more elaborated error message, than the one I'm getting on the main screen of the webserver.
As described in the Airlfow's documentation, to test DAG loading you can simply run:
python your-dag-file.py
If there is any problem during the DAG loading phase you will get a stack trace here.
The later sections also describe how to test custom operators.
As explained in the upgrading manual the
airflow list_dags has been changed to airflow dags list
The full syntax is:
airflow dags list [-h] [-o table, json, yaml] [-S SUBDIR]
for more information see docs
So, I have a problem with even the blank Airflow installation.
As soon as I try to run
airflow test tutorial print_date 2015-06-01
I get a raised exception which says
PendingDeprecationWarning: The requested task could not be added to the DAG because a task with task_id create_tag_template_field_result is already in the DAG. Starting in Airflow 2.0, trying to overwrite a task will raise an exception.
What is the reason for this (as I made literally no changes to the installation whatsoever)?
I also got that when, in a previous installation, I tried to run my own dag... but the "create_tag_template_field_result" was nowhere to be found in my code.
you can set the config arg load_examples = False to solve it.
This is the test command will call get_dag function which will construct a DagBag object, in the construction function will call collect_dags function.
The collect_dags function when the conf arg LOAD_EXAMPLES=True(default True), will collect all the dags in the example path, that's where the task create_tag_template_field_result comes from.
And in the collect_dags function will call add_task function of every example task, that's where you add the create_tag_template_field_result task again.
And maybe it's quickstart when you added this task before for the first time while you didn't realize.
you can set the config arg load_examples = False to solve it
This warning is occuring in
/usr/local/lib/python3.7/dist-packages/airflow/example_dags/example_complex.py
so i remove or rename (for example, to not working name *.py.back ) this.
I had the same error with a fresh install.
Then I don't know if this helps, but I downgraded Airflow to version 1.10.10 (with python3.7) and the error was gone.
I have a Spring task app deployed on PCF. This app get OutOfMemory exception but not terminate the task.
Many people suggested setting env -XX:OnOutOfMemoryError="kill -9 %p" solve this problem. How can I set it on PCF?
When you run an app on Cloud Foundry, the Java buildpack will run and emit a start command which includes a Java agent that properly handles this for you. It's called the jvmkill agent.
https://github.com/cloudfoundry/jvmkill#overview
This will monitor your app for OOME's and if one happens, print some debug info and kill the app. I believe this is exactly the behavior that you're discussing above, but unlike the way you mentioned this method will print debug info prior to killing the app and IMHO is generally more reliable.
For tasks running on Cloud Foundry, the Java buildpack still installs the kill agent, but it cannot actually insert the kill agent into the start command for your task. This is because CF tasks take the start command entirely from the user.
The general recommendation for starting Java based tasks on CF, is to take the start command generated by the Java buildpack to run your app or another app with the same memory limit and adjust it to start your task instead.
For example, here is the start command generated for Spring Music:
JAVA_OPTS="-agentpath:$PWD/.java-buildpack/open_jdk_jre/bin/jvmkill-1.16.0_RELEASE=printHeapHistogram=1 -Djava.io.tmpdir=$TMPDIR -XX:ActiveProcessorCount=$(nproc) -Djava.ext.dirs= -Djava.security.properties=$PWD/.java-buildpack/java_security/java.security $JAVA_OPTS" && CALCULATED_MEMORY=$($PWD/.java-buildpack/open_jdk_jre/bin/java-buildpack-memory-calculator-3.13.0_RELEASE -totMemory=$MEMORY_LIMIT -loadedClasses=27062 -poolType=metaspace -stackThreads=250 -vmOptions="$JAVA_OPTS") && echo JVM Memory Configuration: $CALCULATED_MEMORY && JAVA_OPTS="$JAVA_OPTS $CALCULATED_MEMORY" && MALLOC_ARENA_MAX=2 SERVER_PORT=$PORT eval exec $PWD/.java-buildpack/open_jdk_jre/bin/java $JAVA_OPTS -cp $PWD/.:$PWD/.java-buildpack/container_security_provider/container_security_provider-1.16.0_RELEASE.jar org.springframework.boot.loader.WarLauncher
Note the -agentpath:$PWD/.java-buildpack/open_jdk_jre/bin/jvmkill-1.16.0_RELEASE=printHeapHistogram=1 part, which starts the jvmkill agent.
Now if I want to adjust this to run java -version. I could do the following:
JAVA_OPTS="-agentpath:$PWD/.java-buildpack/open_jdk_jre/bin/jvmkill-1.16.0_RELEASE=printHeapHistogram=1 -Djava.io.tmpdir=$TMPDIR -XX:ActiveProcessorCount=$(nproc) -Djava.ext.dirs= -Djava.security.properties=$PWD/.java-buildpack/java_security/java.security $JAVA_OPTS" && CALCULATED_MEMORY=$($PWD/.java-buildpack/open_jdk_jre/bin/java-buildpack-memory-calculator-3.13.0_RELEASE -totMemory=$MEMORY_LIMIT -loadedClasses=27062 -poolType=metaspace -stackThreads=250 -vmOptions="$JAVA_OPTS") && echo JVM Memory Configuration: $CALCULATED_MEMORY && JAVA_OPTS="$JAVA_OPTS $CALCULATED_MEMORY" && MALLOC_ARENA_MAX=2 SERVER_PORT=$PORT eval exec $PWD/.java-buildpack/open_jdk_jre/bin/java $JAVA_OPTS -version
Note how I just changed the end, where the actual Java arguments are set.
The commands are quite long, but they do end up working and should do the trick for you.
I have a dag which checks for new workflows to be generated (Dynamic DAG) at a regular interval and if found, creates them. (Ref: Dynamic dags not getting added by scheduler )
The above DAG is working and the dynamic DAGs are getting created and listed in the web-server. Two issues here:
When clicking on the DAG in web url, it says "DAG seems to be missing"
The listed DAGs are not listed using "airflow list_dags" command
Error:
DAG "app01_user" seems to be missing.
The same is for all other dynamically generated DAGs. I have compiled the Python script and found no errors.
Edit1:
I tried clearing all data and running "airflow run". It ran successfully but no Dynamic generated DAGs were added to "airflow list_dags". But when running the command "airflow list_dags", it loaded and executed the DAG, (which generated Dynamic DAGs). The dynamic DAGs are also listed as below:
[root#cmnode dags]# airflow list_dags
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8\nLANG=en_US.UTF-8)
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8\nLANG=en_US.UTF-8)
[2019-08-13 00:34:31,692] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=15, pool_recycle=1800, pid=25386
[2019-08-13 00:34:31,877] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-08-13 00:34:32,113] {__init__.py:305} INFO - Filling up the DagBag from /root/airflow/dags
/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py:70: PendingDeprecationWarning: Invalid arguments were passed to BashOperator (task_id: tst_dyn_dag). Support for passing such arguments will be dropped in Airflow 2.0. Invalid arguments were:
*args: ()
**kwargs: {'provide_context': True}
super(BashOperator, self).__init__(*args, **kwargs)
-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
app01_user
app02_user
app03_user
app04_user
testDynDags
Upon running again, all the above generated 4 dags disappeared and only the base DAG, "testDynDags" is displayed.
When I was getting this error, there was an exception showing up in the webserver logs. Once I resolved that error and I restarted the webserver it went through normally.
From what I can see this is the error that is thrown when the webserver tried to parse the dag file and there is an error. In my case it was an error importing a new operator I added to a plugin.
Usually, I check in Airflow UI, sometimes the reason of broken DAG appear in there. But if it is not there, I usually run the .py file of my DAG, and error (reason of DAG cant be parsed) will appear.
I never got to work on dynamic DAG generation but I did face this issue when DAG was not present on all nodes ( scheduler, worker and webserver ). In case you have airflow cluster, please make sure that DAG is present on all airflow nodes.
Same error, the reason was I renamed my dag_id in uppercase. Something like "import_myclientname" into "import_MYCLIENTNAME".
I am little late to the party but I faced the error today:
In short: try executing airflow dags report and/or airflow dags reserialize
Check out my comment here:
https://stackoverflow.com/a/73880927/4437153
I found that airflow fails to recognize a dag defined in a file that does not have from airflow import DAG in it, even if DAG is not explicitly used in that file.
For example, suppose you have two files, a.py and b.py:
# a.py
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
def makedag(dag_id="a"):
with DAG(dag_id=dag_id) as dag:
DummyOperator(task_id="nada")
dag = makedag()
and
# b.py
from a import makedag
dag = makedag(dag_id="b")
Then airflow will only look at a.py. It won't even look at b.py at all, even to notice if there's a syntax error in it! But if you add from airflow import DAG to it and don't change anything else, it will show up.