Remove multiple Airflow installs - airflow

I think I have multiple Airflow installs. They might be conflicting. I restarted airflow-webserver and airflow-scheduler. When I run "airflow scheduler" it says:
Connection in use: ('0.0.0.0', 8793).
My airflow.cfg is in this location: /home/airflow/airflow
It shows:
dags_folder = /var/foo/dags
endpoint_url = http://localhost:8080
web_server_host = 0.0.0.0
web_server_port = 8080
base_url = http://airflow.foo.lan:8080
On the server there's also another airflow folder inside of: /home/dev/
It shows:
dags_folder = /home/dev/airflow/dags
endpoint_url = http://localhost:8080
web_server_host = 0.0.0.0
web_server_port = 8080
base_url = http://localhost:8080
Does one need to be removed or turned off?

You can specify your airflow home by exposing the env var AIRFLOW_HOME:
AIRFLOW_HOME=/path/to/home airflow scheduler
AIRFLOW_HOME=/path/to/home airflow webserver
and Airflow will use the conf $AIRFLOW_HOME/airflow.cfg.

Related

gitlab-runner Could not resolve host: gitlab.com

I have a raspberry pi with gitlab-runner installed (linux version) and a git repository on gitlab.com (not self hosted).
At the beginning of pipeline, gitlab-runner on raspberry try to fetch the .git repo but I get :
Could not resolve host: gitlab.com
I tried :
ping gitlab.com is ok on the raspberry
Add extra_host = ['localhost:my.ip.ad.ress] --> No changes
Add netword_mode = "gitlab_default" like this, And get :
This error :
Error response from daemon: network gitlab_default not found (exec.go:57:1s)
I am in the simplest configuration with repo on gitlab.com and a gitlab-runner on raspberry. How can I deal with it ?
Here is the config.toml :
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab runner on raspberryPi"
url = "https://gitlab.com/"
token = "XXXX"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "node:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
I had same issue , my gitlab-runner was running on my local. I restarted my docker
systemctl restart docker
and error went away.
Not being able to resolve the host name can have multiple root-causes:
IP forwarding disabled?
Routing might be disabled on your system. Check if IP forwarding is enabled (== 1).
cat /proc/sys/net/ipv4/ip_forward
1
If it's disabled, it will return 0, please enable it by editing a sysctl file. For example edit and add: /etc/sysctl.d/99-sysctl.conf:
net.ipv4.conf.all.forwarding = 1
net.ipv4.ip_forward = 1
Apply the setting without rebooting: sudo sysctl --system
Important Note: Even if the system is reporting that IP forwarding is currently enabled, you might want to set to explicitly and correctly in your sysctl configs. Since Docker will run sysctl -w net.ipv4.ip_forward=1 when the Daemon starts-up. But that is not a persistent setting, and might cause very random issues!! Like you have.
DNS missing / invalid?
You can try if setting a DNS server to 8.8.8.8 might fix the problem:
[runners.docker]
dns = ["8.8.8.8"]
Add extra_host?
You can also try to add an extra host, which might be mainly relevant within a local network (so not with gitlab.com domain).
[runners.docker]
extra_hosts = ["gitlab.yourdomain.com:192.168.xxx.xxx"]
Using host network
I really do not advise this, but you could configure the Docker container to run with network_mode as "host". Again, only do this for debugging reasons:
[runners.docker]
network_mode = "host"

How do I deploy Apache-Airflow via uWSGI and nginx?

I'm trying to deploy airflow in a production environment on a server running nginx and uWSGI.
I've searched the web and found instructions on installing airflow behind a reverse proxy, but those instructions only have nginx config examples. However, due to the permissions, I can't change the nginx.conf itself and have to solve it via uswsgi.
My folder structure is:
project_folder
|_airflow
|_airflow.cfg
|_webserver_config.py
|_wsgi.py
|_env
|_start
|_stop
|_uwsgi.ini
My path/to/myproject/uwsgi.ini file is configured as follows:
[uwsgi]
master = True
http-socket = 127.0.0.1:9999
virtualenv = /path/to/myproject/env/
daemonize = /path/to/myproject/uwsgi.log
pidfile = /path/to/myproject/tmp/myapp.pid
workers = 2
threads = 2
# adjust the following to point to your project
wsgi-file = /path/to/myproject/airflow/wsgi.py
touch-reload = /path/to/myproject/airflow/wsgi.py
and currently the /path/to/myproject/airflow/wsgi.py looks as follows:
def application(env, start_response):
start_response('200 OK', [('Content-Type','text/html')])
return [b'Hello World!']
I'm assuming I have to somehow call the airflow flask app from the wsgi.py file (perhaps by also changing some reverse proxy fix configs, since I'm behind SSL), but I'm stuck; what do I have to configure?
Will this procedure then be identical for the workers and scheduler?

How to use airflow configuration file (airflow.cfg) when airflow run in container?

I'm using Airflow that run in container as described here. It seems that the configuration file airflow.cfg on the host have no impact on Airflow. I tried the solution here but it didn't help.
The configuration fields I changed are:
default_timezone = system #(from utc)
load_examples = False #(from True)
base_url = http://localhost:8081 #(from 8080)
default_ui_timezone = system #(from UTC)
I didn't see any impact on airflow eventhough I did docker-compose down and docker-compose up

Airflow - Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url

I am running Airflowv1.9 with Celery Executor. I have 5 Airflow workers running in 5 different machines. Airflow scheduler is also running in one of these machines. I have copied the same airflow.cfg file across these 5 machines.
I have daily workflows setup in different queues like DEV, QA etc. (each worker runs with an individual queue name) which are running fine.
While scheduling a DAG in one of the worker (no other DAG have been setup for this worker/machine previously), I am seeing the error in the 1st task and as a result downstream tasks are failing:
*** Log file isn't local.
*** Fetching here: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
*** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
I have configured MySQL for storing the DAG metadata. When I checked task_instance table, I see proper hostnames are populated against the task.
I also checked the log location and found that the log is getting created.
airflow.cfg snippet:
base_log_folder = /var/log/airflow
base_url = http://<webserver ip>:8082
worker_log_server_port = 8793
api_client = airflow.api.client.local_client
endpoint_url = http://localhost:8080
What am I missing here? What configurations do I need to check additionally for resolving this issue?
Looks like the worker's hostname is not being correctly resolved.
Add a file hostname_resolver.py:
import os
import socket
import requests
def resolve():
"""
Resolves Airflow external hostname for accessing logs on a worker
"""
if 'AWS_REGION' in os.environ:
# Return EC2 instance hostname:
return requests.get(
'http://169.254.169.254/latest/meta-data/local-ipv4').text
# Use DNS request for finding out what's our external IP:
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect(('1.1.1.1', 53))
external_ip = s.getsockname()[0]
s.close()
return external_ip
And export: AIRFLOW__CORE__HOSTNAME_CALLABLE=airflow.hostname_resolver:resolve
The web program of the master needs to go to the worker to fetch the log and display it on the front-end page. This process is to find the host name of the worker. Obviously, the host name cannot be found,Therefore, add the host name to IP mapping on the master's vim /etc/hosts
If this happens as part of a Docker Compose Airflow setup, the hostname resolution needs to be passed to the container hosting the webserver, e.g. through extra_hosts:
# docker-compose.yml
version: "3.9"
services:
webserver:
extra_hosts:
- "worker_hostname_0:192.168.xxx.yyy"
- "worker_hostname_1:192.168.xxx.zzz"
...
...
More details here.

Web server(nginx) in mininet

After using sudo mn to build a simple network in mini-net, I use nginx to build a web server in host1.
I use systemctl start nginx in host1 xterm to build a web server. But it seems it starts a web server on my localhost, not in the mini-net. I cannot access the web server in host1 and host2 by Firefox in mini-net.
Is there anything wrong in my operation?
the reason why you cannot connect to the server on host1 is as you said - the host isn't there it's running on 127.0.0.1 (localhost) of your hosts's machine not any of your mininet hosts.
The way to get around this is by telling nxinx to run on your host's (local) IP explicitly via the server conf file.
Here's an example that works for me. (Tested with nginx 1.4.6, mininet 2.3.0 and ubuntu 18.04)
from mininet.topo import Topo
from mininet.node import CPULimitedHost
from mininet.link import TCLink
from mininet.net import Mininet
import time
class DumbbellTopo(Topo):
def build(self, bw=8, delay="10ms", loss=0):
switch1 = self.addSwitch('switch1')
switch2 = self.addSwitch('switch2')
appClient = self.addHost('aClient')
appServer = self.addHost('aServer')
crossClient = self.addHost('cClient')
crossServer = self.addHost('cServer')
self.addLink(appClient, switch1)
self.addLink(crossClient, switch1)
self.addLink(appServer, switch2)
self.addLink(crossServer, switch2)
self.addLink(switch1, switch2, bw=bw, delay=delay, loss=loss, max_queue_size=14)
def simulate():
dumbbell = DumbbellTopo()
network = Mininet(topo=dumbbell, host=CPULimitedHost, link=TCLink, autoPinCpus=True)
network.start()
appClient = network.get('aClient')
appServer = network.get('aServer')
wd = str(appServer.cmd("pwd"))[:-2]
appServer.cmd("echo 'b a n a n a s' > available-fruits.html")
appServer.cmd("echo 'events { } http { server { listen " + appServer.IP() + "; root " + wd + "; } }' > nginx-conf.conf") # Create server config file
appServer.cmd("sudo nginx -c " + wd + "/nginx-conf.conf &") # Tell nginx to use configuration from the file we just created
time.sleep(1) # Server might need some time to start
fruits = appClient.cmd("curl http://" + appServer.IP() + "/available-fruits.html")
print(fruits)
appServer.cmd("sudo nginx -s stop")
network.stop()
if __name__ == '__main__':
simulate()
This way we create the nginx conf file (nginx-conf.conf), then tell nginx to use this for its configuration.
Alternatively if you want to start it from a terminal on the host, create the conf file and then use the command to tell nginx to run with this file as shown in the code above.

Resources