I am currently setup airflow scheduler in Linux server A and airflow web server in Linux server B. Both server has no Internet access. I have start the initDB in server A and keep all the dags in server A.
However, when i refresh the webserver UI, it keep having error message:-
This DAG isn't available in the webserver DagBag object
How do i configure the dag folder for web server (server B) to read the dag from scheduler (server A)?
I am using bashoperator. Is that Celery Operator is a must?
Thanks in advance
The scheduler has found your dags_folder, and its processes, and is scheduling them accordingly. The webserver however can "see" these processes solely by their existence in the database but can't find them in its dags_folder path.
You need to ensure that the dags_folder for both servers contain the same files, and that both are kept in sync with one another. This is out of scope for Airflow and it won't handle this on your behalf.
Related
When deploying airflow in cluster mode it is necessary to ensure that the dag files of all worker nodes are consistent. When I modify a dag file in airflow web UI by airflow-code-editor plugin, only one dag file of the worker node where airflow webserver is located will be modified. The dag files of other worker nodes will not be modified synchronously.
How can I resolve this problem?
Hi based on airflow docs I am able to set up cloud/remote logging.
Remote logging is working for dag and task logs but it's not able to back up or remotely store following logs of.
scheduler
dag_processing_manager
I am using docker_hub airflow docker image.
I was wondering if the Airflow's scheduler and webserver Daemons could be launched on different server instances ?
And if it's possible, why not use serverless architecture for the flask web server ?
There is a lot of resources about multi nodes cluster for workers but I found nothing about splitting scheduler and webserver.
Has anyone already done this ? And what may be the difficulties I will be facing ?
I would say the minimum requirement would be that both instance should have
Read(-write) access to the same AIRFLOW_HOME directory (for accessing DAG scripts and the shared config file)
Access to the same database backend (for accessing shared metadata)
Exactly the same Airflow version (to prevent any potential incompatibilities)
Then just try it out and report back (I am really curious ;) ).
I have a AirFlow service running normally on remote machine, which can be accessed through Browser with URL: http://airflow.xxx.com
Now I want to dynamically upload DAGs from another machine to AirFlow at airflow.xxx.com, and make that DAG auto run.
After I read the airflow document: http://airflow.incubator.apache.org/, I found way to dynamically create DAGs and auto run it, which can be done on the airflow machine airflow.xxx.com.
But I want to do it in another machine, how can I accomplish it, is there a way like webhdfs, which let me directly send command to remote AirFlow?
You should upload your new dag in the Apache Airflow dag directory.
If you didn't set Airflow up in a cluster environment, you should have web-server, scheduler and worker all running on the same machine.
On that machine, if you did not amend airflow.cfg, you should have your dag directory in dags_folder = /usr/local/airflow/dags
If you access the Airflow machine from the other machine through SFTP (or FTP) you can simply put the file in that dir.
I have ASP.NET web application which is hosted on two different IIS web servers(Server A and Server B) for http request Clustering. I have designed web application where user can able to create and kick-off(run) manual windows task scheduler on IIS web server where website is hosted (but in my case it is hosted on two different web servers for load balancing).
When first time user creates scheduler from web UI and http request goes to Server A to create scheduler, so it will created manual windows task scheduler on Server A. But now next time when user tries to kick-off the windows task scheduler and http request goes to Server B, but there is no windows task scheduler on Server B (in first http request windows scheduler has been created on Server A). The second http request is unable to find the task scheduler on Server B and it is displaying the alert message that no windows task scheduler found.
As below, Server A has one scheduler- MyScheduler but the server B does not have any scheduler with same name
How can I come out of this challange, please do the needful.
After lot of research and development I found that Windows Task scheduler allows us to define the Target Server where you want to Create, Read, Run and Delete the tasks.
TaskService.TargetServer property is gets the name of the computer that is running the Task Scheduler service that the user is connected to.You can find more details about the TaskService here.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa383474(v=vs.85).aspx
Suppose we have Server A and Server B as mentioned in question above. So we can define Server A as target server to create scheduler on single machine in Cluster environment.
Example:
TaskService taskService=new TaskService(targetServer: "Server A");