Airflow systemd scheduler not working

Airflow systemd scheduler not working - airflow

I got the webserver to work properly, and my airflow-scheduler.service file starts the scheduler and it finds my dags etc. However, tasks are not running:
I see an error message about /bin/sh
ERROR - failed to execute task Command 'exec bash -c run'
I have my sysconfig file:
#!/bin/bash
PATH=/opt/anaconda/anaconda3/envs/airflow_env/bin/airflow
AIRFLOW_CONFIG=/mnt/var/airflow/airflow.cfg
AIRFLOW_HOME=/mnt/var/airflow
And my airflow-scheduler.service file:
#!/bin/bash
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service
Wants=postgresql.service
[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/opt/anaconda/anaconda3/envs/airflow_env/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
Here is the journalctl record which shows the bash error I am getting:
[2017-10-30 18:36:13,764] {base_executor.py:50} INFO - Adding to queue: airflow run user_presence_raw_etl transform_raw_user_presence 2017-10-30T14:00:00 --local -sd /mnt/var/airflow/dags/bin/user_p
Oct 30 18:36:13 airflow[4742]: [2017-10-30 18:36:13,765] {jobs.py:1443} INFO - Heartbeating the executor
Oct 30 18:36:13 airflow[4742]: [2017-10-30 18:36:13,783] {local_executor.py:45} INFO - LocalWorker running airflow run user_presence_raw_etl transform_raw_user_presence 2017-10-30T14:00:00 --local -sd /mnt/var/airflow/dags/bin/us
Oct 30 18:36:13 airflow[4742]: /bin/sh: 1: exec: bash: not found
Oct 30 18:36:13 airflow[4742]: [2017-10-30 18:36:13,865] {local_executor.py:52} **ERROR - failed to execute task Command 'exec bash -c 'airflow run** user_presence_raw_etl transform_raw_user_presence 2017-10-30T14:00:00 --local -sd /mnt/var/airf
Oct 30 18:36:14 airflow[4742]: [2017-10-30 18:36:14,786] {jobs.py:1407} INFO - Heartbeating the process manager
Oct 30 18:36:14 airflow[4742]: [2017-10-30 18:36:14,786] {dag_processing.py:559} INFO - Processor for /mnt/var/airflow/dags/bin/prod/hourly_agent_dag.py finished
Oct 30 18:36:14 airflow[4742]: [2017-10-30 18:36:14,789] {dag_processing.py:627} INFO - Started a process (PID: 5425) to generate tasks for /mnt/var/airflow/dags/bin/prod/daily_agent_email_dag.py - logging into /mnt/var/airflow/l
Oct 30 18:36:14 airflow[4742]: [2017-10-30 18:36:14,831] {jobs.py:1000} INFO - No tasks to send to the executor
Oct 30 18:36:14 airflow[4742]: [2017-10-30 18:36:14,832] {jobs.py:1443} INFO - Heartbeating the executor
Oct 30 18:36:14 airflow[4742]: [2017-10-30 18:36:14,833] {jobs.py:1195} INFO - Executor reports user_presence_raw_etl.transform_raw_user_presence execution_date=2017-10-30 14:00:00 as failed

Looks like your sysconfig file is clobbering the $PATH environment variable, hence the error:
Oct 30 18:36:13 airflow[4742]: /bin/sh: 1: exec: bash: not found
Try setting that line to the following:
PATH=/opt/anaconda/anaconda3/envs/airflow_env/bin/airflow:$PATH

It looks like you installed airflow as a user other than airflow. I have a airflow-scheduler.service file that specifies the user that I used to install airflow as, in my case, mghen.
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=mghen
Group=mghen
Type=simple
ExecStart=/usr/local/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
I fought with a similar issue but didn't see this same error message. Anyway, maybe changing the User and Group might help. Since you specify an absolute path when you start airflow (ExecStart=/opt/anaconda/anaconda3/envs/airflow_env/bin/airflow scheduler) it makes it look like it's starting fine as the airflow user but really that airflow user can't actually run airflow the program because another user installed it. Idk how to fix the PATH I just changed User and Group in my airflow-*.service files.
Alternatively, perhaps you can install airflow as the airflow user so it's available in the airflow user's PATH.

Workers processes spawned by LocalExecutor cannot run airflow since it's not available in PATH. You only configure it for sysconfig. In example provided in airflow repository, airflow-scheduler.service they expect airflow to be executable for a user.
My suggestion is to create virtualenv for airflow, install airflow there and then activate it for each shell where you want to run something related to airflow e.g. workers, scheduler or webserver.

Related

dag_id could not be found: dag_id. Either the dag did not exist or it failed to parse after upgrading airflow from 1.7.3 to 1.10.1

I upgraded the airflow version from 1.7.3 to 1.10.1. After up-gradation of the scheduler, webserver and workers, the dags have stopped working showing below error on scheduler-
Either the dag did not exist or it failed to parse.
I have not made any changes to the config. While investigating the issue the scheduler logs shows the issue. Earlier the scheduler run the task as -
Adding to queue: airflow run <dag_id> <task_id> <execution_date> --local -sd DAGS_FOLDER/<dag_filename.py>
While now it is running with absolute path -
Adding to queue: airflow run <dag_id> <task_id> <execution_date> --local -sd /<PATH_TO_DAGS_FOLDER>/<dag_filename.py>
PATH_TO_DAGS_FOLDER is like /home/<user>/Airflow/dags...
which is same as what it is pushing it to workers by since worker is running on some other user it is not able to find the dag location specified.
I am not sure how to tell the worker to look in it's own airflow home dir and not the scheduler one?
I am using mysql as backend and rabbitmq for message passing.

Docker daemon can't initialize network controller

I'm having trouble starting my docker daemon. I've installed docker but when I try to run # systemctl start docker.service it throws an error.
$ systemctl status docker.service gives me this:
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2016-09-21 14:38:24 CEST; 6s ago
Docs: https://docs.docker.com
Process: 5592 ExecStart=/usr/bin/dockerd -H fd:// (code=exited, status=1/FAILURE)
Main PID: 5592 (code=exited, status=1/FAILURE)
Sep 21 14:38:24 tp-x230 dockerd[5592]: time="2016-09-21T14:38:24.271068176+02:00" level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem will be ignored."
Sep 21 14:38:24 tp-x230 dockerd[5592]: time="2016-09-21T14:38:24.327814644+02:00" level=info msg="[graphdriver] using prior storage driver \"devicemapper\""
Sep 21 14:38:24 tp-x230 dockerd[5592]: time="2016-09-21T14:38:24.329895994+02:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Sep 21 14:38:24 tp-x230 dockerd[5592]: time="2016-09-21T14:38:24.330707721+02:00" level=info msg="Loading containers: start."
Sep 21 14:38:24 tp-x230 dockerd[5592]: time="2016-09-21T14:38:24.335610867+02:00" level=info msg="Firewalld running: false"
Sep 21 14:38:24 tp-x230 dockerd[5592]: time="2016-09-21T14:38:24.461243263+02:00" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to parse pool request for ad
Sep 21 14:38:24 tp-x230 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Sep 21 14:38:24 tp-x230 systemd[1]: Failed to start Docker Application Container Engine.
Sep 21 14:38:24 tp-x230 systemd[1]: docker.service: Unit entered failed state.
Sep 21 14:38:24 tp-x230 systemd[1]: docker.service: Failed with result 'exit-code'.
with the relevant line being:
Sep 21 14:38:24 tp-x230 dockerd[5592]: time="2016-09-21T14:38:24.461243263+02:00" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to parse pool request for ad

Your error text was cut, so i can not check if it is exactly the same error, but i got this error:
Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to parse pool request for address space "LocalDefault" pool "" subpool "": could not find an available predefined network
This was related to the machine having several network cards (can also happen in machines with VPN, you can also temporarily stop it, start docker and restart the vpn OR apply the below workaround)
To me, the solution was to start manually docker like this:
/usr/bin/docker daemon --debug --bip=192.168.y.x/24
where the 192.168.y.x is the MAIN machine IP and /24 that ip netmask. Docker will use this network range for building the bridge and firewall riles. The --debug is not really needed, but might help if something else fails
After starting once, you can kill the docker and start as usual. AFAIK, docker have created a cache config for that --bip and should work now without it. Of course, if you clean the docker cache, you may need to do this again.

This worked for me, from https://github.com/microsoft/WSL/issues/6655
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo dockerd &

Apparently deleting
/var/lib/docker/network/files/local-kv.db
could work in some cases.
Source: https://github.com/moby/moby/issues/18113#issuecomment-161058473
Another explanation could be problem caused by using VPN: https://github.com/moby/moby/issues/31546#issuecomment-284196714

Turns out I needed to enable IP forwarding for the network that Docker was trying to use, namely the bridge network.
This was done by creating the file /etc/systemd/network/bridge.network with the content
[Network]
IPFoward=kernel
and then restarting the systemd-networkd daemon with # systemctl restart systemd-networkd.service. After this, # systemctl start docker.service worked fine.
P.S. After restarting the network daemon, I was disconnected from my network (as one might expect) and had to manually connect. Might be worth considering if you've got something important going on.

Maria DB not starting. Job for mariadb.service failed. See 'systemctl status mariadb.service' and 'journalctl -xn' for details

I am trying to install maria db and getting the following issue.
[root#localhost ~]# service mysqld start
Redirecting to /bin/systemctl start mysqld.service
Job for mariadb.service failed. See 'systemctl status mariadb.service' and 'journalctl -xn' for details.
I tried 'systemctl status mariadb.service' and 'journalctl -xn' and follows the details.
[root#localhost ~]# systemctl status mariadb.service
mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled)
Active: failed (Result: exit-code) since Sun 2014-09-21 17:19:44 IST; 23s ago
Process: 2712 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)
Process: 2711 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
Process: 2683 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)
Main PID: 2711 (code=exited, status=0/SUCCESS)
Sep 21 17:19:42 localhost.localdomain mysqld_safe[2711]: 140921 17:19:42 mysqld_safe Logging to '/var/lib/mysql/localhost.localdomain.err'.
Sep 21 17:19:42 localhost.localdomain mysqld_safe[2711]: 140921 17:19:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Sep 21 17:19:43 localhost.localdomain mysqld_safe[2711]: 140921 17:19:43 mysqld_safe mysqld from pid file /var/lib/mysql/localhost.localdoma...d ended
Sep 21 17:19:44 localhost.localdomain systemd[1]: mariadb.service: control process exited, code=exited status=1
Sep 21 17:19:44 localhost.localdomain systemd[1]: Failed to start MariaDB database server.
Sep 21 17:19:44 localhost.localdomain systemd[1]: Unit mariadb.service entered failed state.
[root#localhost ~]# journalctl -xn
-- Logs begin at Sun 2014-09-21 02:33:29 IST, end at Sun 2014-09-21 17:20:11 IST. --
Sep 21 17:16:26 localhost.localdomain systemd[1]: Started dnf makecache.
-- Subject: Unit dnf-makecache.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dnf-makecache.service has finished starting up.
--
-- The start-up result is done.
Sep 21 17:18:11 localhost.localdomain NetworkManager[683]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
Sep 21 17:19:42 localhost.localdomain systemd[1]: Starting MariaDB database server...
-- Subject: Unit mariadb.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mariadb.service has begun starting up.
Sep 21 17:19:42 localhost.localdomain mysqld_safe[2711]: 140921 17:19:42 mysqld_safe Logging to '/var/lib/mysql/localhost.localdomain.err'.
Sep 21 17:19:42 localhost.localdomain mysqld_safe[2711]: 140921 17:19:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Sep 21 17:19:43 localhost.localdomain mysqld_safe[2711]: 140921 17:19:43 mysqld_safe mysqld from pid file /var/lib/mysql/localhost.localdomain.pid end
Sep 21 17:19:44 localhost.localdomain systemd[1]: mariadb.service: control process exited, code=exited status=1
Sep 21 17:19:44 localhost.localdomain systemd[1]: Failed to start MariaDB database server.
-- Subject: Unit mariadb.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mariadb.service has failed.
--
-- The result is failed.
Sep 21 17:19:44 localhost.localdomain systemd[1]: Unit mariadb.service entered failed state.
Sep 21 17:20:11 localhost.localdomain NetworkManager[683]: <warn> nl_recvmsgs() error: (-33) Dump inconsistency detected, interrupted
Can any one please help?
I have tried uninstalling and installing many a times but received the same error.
Thanks in advance.

Most of the time if the system journal (journalctl) doesn't show what the problem was, the MariaDB error log (located in /var/lib/mysql/localhost.localdomain.err) does. Looking into that file you usually see what the problem is.
Most commonly errors that do not disappear after a reinstallation mean that your data directory (by default /var/lib/mysql/) is corrupted and the database needs to be reinstalled with mysql_install_db. To make sure you do a clean installation, remove all files located in the data directory and then run sudo mysql_install_db --user=mysql.

I solved as follows:
After installing
Run: > mysql_install_db --user=mysql --basedir=/usr --datadir=/var/lib/
Then: > mysql_secure_installation
And then: systemctl start mariadb
with this this, I can resolved.

A quick update for anyone coming here through a Web search.
I had a "failed to start" message following a Debian 9 -> Debian 10 (Buster) in-place server upgrade, and after a bit of digging I found that the following line in /etc/mysql/my.cnf needed updating:
From:
[mysqld]
: (other stuff)
:
innodb_large_prefix
To:
[mysqld]
: (other stuff)
:
innodb_large_prefix = "ON"
The clue was the following lines in /var/log/mysql/error.log
2020-06-06 16:41:24 0 [ERROR] /usr/sbin/mysqld: option '--innodb-large-prefix' requires an argument
2020-06-06 16:41:24 0 [ERROR] Parsing options for plugin 'InnoDB' failed.

Not sure about your case, But u can check if mariadb/mysql client is deleted accidentally,
Since in my case, I had deleted the mariadb client repo from shared file,So reinstalled the client as,
sudo apt-get install libmariadb-dev
Note: But before installing client do one thing, for rails app just change the mysql version in gemfile and try to install as bundle install mysql2, If its mariadb client issue then it will throw error mentioning that need to install mariadb client or mysql2 client as,
sudo apt-get install libmariadb-dev
OR
sudo apt-get install libmysqlclient-dev
Please refer other answers in case its not mariadb client issue 😊 😜

This is the optimal solution for this problem:
netstat -tulpn | grep LISTEN
Now we need to look for mysqld service.
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 32029/mysqld
now kill the process in my case it is (32029).
kill 32029
Now use systemctl start mariadb.service

Plone and systemctl

I tried to integrate Plone with a systemctl based startup (on openSUSE 12.3)
As a first attempt I have a very simple plone.service file
[Unit]
Description=Plone content management system
After=network.target
[Service]
Type=simple
ExecStart=/srv/plone/zeocluster/bin/plonectl start
[Install]
WantedBy=multi-user.target
Checking with systemclt status plone I see that the processes get started, but they immediatly vanish again. I've also tried Type=Daemon, but the endresult is the same.
Any hints were my error is?
The service actually finds/executes the plonectl script successfully, just the processes die quickly
linux-wezo:/etc/systemd/system # systemctl start plone.service
linux-wezo:/etc/systemd/system # systemctl status plone.service
plone.service - Plone content management system
Loaded: loaded (/etc/systemd/system/plone.service; disabled)
Active: inactive (dead) since Mon, 2013-03-18 22:00:50 CET; 1s ago
Process: 25494 ExecStart=/srv/plone/zeocluster/bin/plonectl start (code=exited, status=0/SUCCESS)
CGroup: name=systemd:/system/plone.service
Mar 18 22:00:42 linux-wezo.site systemd[1]: Starting Plone content management system...
Mar 18 22:00:42 linux-wezo.site systemd[1]: Started Plone content management system.
Mar 18 22:00:43 linux-wezo.site plonectl[25494]: zeoserver: .
Mar 18 22:00:43 linux-wezo.site plonectl[25494]: daemon process started, pid=25502
Mar 18 22:00:46 linux-wezo.site plonectl[25494]: client1: .
Mar 18 22:00:46 linux-wezo.site plonectl[25494]: daemon process started, pid=25507
Mar 18 22:00:49 linux-wezo.site plonectl[25494]: client2: .
Mar 18 22:00:49 linux-wezo.site plonectl[25494]: daemon process started, pid=25522
I do have a SysV style init script, that works via systemctl, but think it would be great to have a service file since this should be more generic than the various init scripts floating around.

The issue is the program plonectl is not a daemon, it is a wrapper script that starts Zope. You need to change the type to forking and probably to tell systemd where to find the PID file.

Plonectl forks the daemon. Try this in plone.service:
[Unit]
Description=Plone content management system
After=network.target
ConditionPathExists=/srv/plone/zeocluster/bin/plonectl
[Service]
Type=forking
ExecStart=/srv/plone/zeocluster/bin/plonectl start
ExecStop=/srv/plone/zeocluster/bin/plonectl stop
ExecReload=/srv/plone/zeocluster/bin/plonectl restart
[Install]
WantedBy=multi-user.target

Qt GUI instance autostarted with systemd does not respond to input

I have an embedded Linux board that uses systemd for startup processes. I also have a GUI written in Qt that I can launch just fine from the command line and interact with using the board's touchscreen or buttons. To launch the app I usually do:
ssh root#192.168.1.2
cd ~/
./gui
I'd like this to startup automatically using SystemD so I wrote a service file that looks like this:
[Unit]
Description=The Qt Gui
After=dropbear.service systemd-logind.service
ConditionFileIsExecutable=/home/user/gui
[Service]
ExecStart=/home/user/gui
Restart=on-abort
[Install]
WantedBy=multi-user.target
When the board boots I see in the systemd log that it is starting my gui right after the dropbear SSH service:
[ OK ] Started Dropbear SSH2 daemon.
Starting The Qt Gui...
[ OK ] Started The Qt Gui.
And if I SSH to the board after it boots and run 'ps' I can see my process started ( and if I kill it it restarts as expected from systemd ):
196 root 26868 S /home/user/gui
The output of systemctl status looks OK to me, notice the last line 'ARM build' is a qDebug() print statement from my code:
gui.service - The Qt Gui
Loaded: loaded (/etc/systemd/system/gui.service; enabled)
Active: active (running) since Tue, 2012-11-20 21:30:20 UTC; 4min 35s ago
Main PID: 196 (gui)
CGroup: name=systemd:/system/gui.service
└ 196 /home/user/gui
Nov 20 21:30:20 systemd[1]: Starting The Qt Gui...
Nov 20 21:30:20 systemd[1]: Started The Qt Gui.
Nov 20 21:30:22 gui[196]: ARM build
However I can not interact at all with the instance of the GUI being launched using systemd! If I launch a secondary instance from the command line than I can press buttons or the touchscreen and the GUI pops up on the screen and works as expected. What gives? I've tried 'Type=forking' in the service file and this doesn't help either. Any ideas on what is wrong here? How can I get systemd to launch my Qt GUI just as if I did it from the command line? Thanks -

Thanks for the input above. It actually was not a working directory or timing issue. The problem was that my Qt GUI was not getting the proper environment variables that it needed for communicating with the touchscreen. Sourcing /etc/profile worked for me:
[Unit]
Description=The Qt Gui
After=dropbear.service
ConditionFileIsExecutable=/home/user/gui
[Service]
Type=simple
TimeoutStartSec=60
WorkingDirectory=/home/user
ExecStartPre=/bin/sh -c 'echo 127 > /sys/class/backlight/generic-bl/brightness'
ExecStart=/bin/sh -c 'source /etc/profile ; /home/user/gui -qws'
Restart=always
[Install]
WantedBy=multi-user.target

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Airflow systemd scheduler not working - airflow

Looks like your sysconfig file is clobbering the $PATH environment variable, hence the error: Oct 30 18:36:13 airflow[4742]: /bin/sh: 1: exec: bash: not found Try setting that line to the following: PATH=/opt/anaconda/anaconda3/envs/airflow_env/bin/airflow:$PATH

Related

dag_id could not be found: dag_id. Either the dag did not exist or it failed to parse after upgrading airflow from 1.7.3 to 1.10.1

Docker daemon can't initialize network controller

Maria DB not starting. Job for mariadb.service failed. See 'systemctl status mariadb.service' and 'journalctl -xn' for details

Plone and systemctl

Qt GUI instance autostarted with systemd does not respond to input

Categories

Resources