Airflow git code and web code are different.
It was obviously changed in git code, but it doesn't seem to be changed on airflow web.
Deleting the dag and re-creating it does not change it.
Is it possible to solve this problem?
Related
I am running airflow 2.0, setting up airflow dag for the first time, and following quick start tutorials.
After creating and running the py file I don't see the dag created it does not list for me.
setting:
airflow.cfg:dags_folder = /Users/vik/src/airflow/dags
my python file is in this folder. There are no errors here.
I am able to see the example dags in example-dags.
I did airflow db init
airflow webserver
airflow scheduler
then try to list the dags
I think I am missing something
I don't know exactly how you installed everything, but I highly recommend Astronomer CLI for simplicity and quick setup. With that you'll be able to setup a first DAG pretty quickly. Here is also the video tutorial that helps you understand how to install / setup everything.
A few things to try:
Make sure the scheduleris running (run airflow scheduler) or try to restart it .
Using the Airflow CLI, run airflow config list and make sure that the loaded config is in fact what you are expecting, check the value of dags_folder.
Try running airflow dags list from the CLI, and check the the filepath if your dag is shown in the results.
If there was an error parsing your DAG, and therefore could not be loaded by the scheduler, you can find the logs in ${AIRFLOW_HOME}/logs/scheduler/${date}/your_dag_id.log
I'm developing a DAG on Cloud Composer; my code is separated into a main python file and one package with subfolders, it looks like this:
my_dag1.py
package1/__init__.py
package1/functions.py
package1/package2/__init__.py
package1/package2/more_functions.py
I updated one of the functions on package1/functions.py to take an additional argument (and update the reference in my_dag1.py). The code would run correctly on my local environment and I was not getting any errors when running
gcloud beta composer environments run my-airflow-environment list_dags --location europe-west1
But the Web UI raised a python error
TypeError: my_function() got an unexpected keyword argument
'new_argument'
I have tried to rename the function and the error changed to
NameError: name 'my_function' is not defined
I tried changing the name of the DAG and to upload the files to the dag folder zipped and unzipped, but nothing worked.
The error disappeared only after I renamed the package folder.
I suspect the issue is related to the scheduler picking up my_dag1.py but not package1/functions.py. The error appeared out of nowhere as I have made similar updates on the previous weeks.
Any idea on how to fix this issue without refactoring the whole code structure?
EDIT-1
Here's the link to related discussion on Google Groups
I've run into a similar issue. the "Broken DAG" error won't dismiss in Web UI. I guess this is a cache bug in Web server of AirFlow.
Background.
I created a customized operator with Airflow Plugin features.
After I import the customized operator, the airflow Web UI keep shows the Broken DAG error says that it can't find the customized operator.
Why I think it's a bug in Web server Airflow?
I can manually run the DAG with the command airflow test, so the import should be correct.
Even if I remove the related DAG file from the /dags/ folder of airflow, the error still there.
Here are What I did to resolve this issue.
restart airflow web service. (sometimes you can resolve the issue only by this).
make sure no DAG is running, restart airflow scheduler service.
make sure no DAG is running, restart airflow worker
Hopefully, it can help someone has the same issue.
Try restarting the webserver with:
gcloud beta composer environments restart-web-server ENVIRONMENT_NAME --location=LOCATION
I am very new to Airflow, I have set-up everything according to what are stated on their website. However I find it very confusing to figure out my dag folder location. NOTE: I configure **airflow.cfg (/airflow/dags) within this folder has two files.
/airflow/dags
---dag1.py
---dag2.py
But when I try to do airflow list_dags, it still shows the dags inside example_dags folder on
usr/local/lib/python2.7/dist_packages/airflow/example_dags
How can I see the path when I do airflow list_dags and to change it ? Any helps is appreciated.
There is an airflow.cfg value under the [core] section called load_examples that you can set to false to exclude the example DAGs. I think that should clean up the output you’re seeing from list_dags.
I'm having trouble updating a dag file. Dag still have an old version of my dag file. I added a task but it seems not updated when I check the log and UI (DAG->Code).
I have very simple tasks.
I of course checked the dag directory path in airflow.cfg and restarted airflow webserver/scheduler.
I have no issue running it (but with the old dag file).
Looks like a bug of airflow. A temp solution is to delete the task instances from airflow db by
delete from task_instance where dag_id=<dag_name> and task_id=<deleted_task_name>';
This should be simpler and less impactful than the resetdb route which would delete everything including variables and connections set before.
Use terminal and run the below command soon after changing the dag file.
airflow initdb
This worked for me.
You can try to remove the old .pyc file for that dag in the dags folder and generate it again.
UI sometimes is not up to date to me, but the code is actually there in dag bag. You can try to:
Use refresh button to see if code refreshed
Use higher version 1.8+, this happens to me before when I used 1.7.X, but after 1.8+, it seems much better after you refresh dag in UI
You can also use "airflow test" to see if the code is in place, and try the advice from #Him as well.
Same thing happened to me.
In the end the best thing is to "resetdb", add connections and import variables again and then airflow initdb and set the scheduler back again.
I don't know why this happens. Anybody knows? It seems not so easy to add tasks or change names once compiled. Removing *.pyc or logs folder did not work for me.
In DAG page of Airflow webserver, delete the DAG. It will delete the record in the database. After a while the DAG will appear again in the page, but the old task_id is removed.
I'm new to codedeploy. I managed to make a deployment to an ec2 instance successfully (and using git to manage code so everything works beautifully now).
I want some other people besides myself working in the project to be able to deploy source code to the instance but not be able to run a script (especially because codedeploy seems to be running as root). Think of it as an admin/webmaster scenario.
In other words, appspec.yml has the "hooks" section under it and you can run any scripts as part of the deployment. I want to prevent this, the instance has all the software ready for the deployment so won't be needing this.
2 questions:
1) Does this make sense or am I grossly misunderstanding something/am I overkilling by using codedeploy altogether?
2) If it makes sense, how can I achieve this?
This doesn't seem to be something that CodeDeploy is able to do at this moment. But do you want to disable the auto deploy from Github to CodeDeploy? And if anyone else push a code change, it'll exist on Github. When you are ok with the changes, you can do a manually deployment from Github on CodeDeploy console.