Airflow 2.0.2 - No user yet created - airflow

we're moving from airflow 1.x to 2.0.2, and I'm noticing the below error in my terminal after i run docker-compose run --rm webserver initdb:
{{manager.py:727}} WARNING - No user yet created, use flask fab
command to do it.
but in my entrypoint.sh I have the below to create users:
echo "Creating airflow user: ${AIRFLOW_CREATE_USER_USER_NAME}..."
su -c "airflow users create -r ${AIRFLOW_CREATE_USER_ROLE} -u ${AIRFLOW_CREATE_USER_USER_NAME} -e ${AIRFLOW_CREATE_USER_USER_NAME}#vice.com \
-p ${AIRFLOW_CREATE_USER_PASSWORD} -f ${AIRFLOW_CREATE_USER_FIRST_NAME} -l \
${AIRFLOW_CREATE_USER_LAST_NAME}" airflow
echo "Created airflow user: ${AIRFLOW_CREATE_USER_USER_NAME} done!"
;;
Because of this error whenever I try to run airflow locally I still have to run the below to create a user manually every time I start up airflow:
docker-compose run --rm webserver bash
airflow users create \
--username name \
--firstname fname \
--lastname lname \
--password pw \
--role Admin \
--email email#email.com

Looking at the airflow docker entrypoint script entrypoint_prod.sh file, looks like airflow will create the an admin for you when the container on boots.
By default the admin user is 'admin' without password.
If you want something diferent, set this variables: _AIRFLOW_WWW_USER_PASSWORD and _AIRFLOW_WWW_USER_USERNAME

(I'm on airflow 2.2.2)
Looks like they changed the admin creation command password from -p test to -p $DEFAULT_PASSWORD. I had to pass in this DEFAULT_PASSWORD env var to the docker-compose environment for the admin user to be created. It also looks like they now suggest using the .env.localrunner file for configuration.
Here is the commit where that change was made.
(I think you asked this question prior to that change being made, but maybe this will help someone in the future who had my same issue).

Related

How do you access Airflow Web Interface?

Hi I am taking a datacamp class on how to use Airflow and it shows how to create dags once you have access to an Airflow Web Interface.
Is there an easy way to create an account in the Airflow Web Interface? I am very lost on how to do this or is this just an enterprise tool where they provide you access to it once you pay?
You must do this on terminal. Run these commands:
export AIRFLOW_HOME=~/airflow
AIRFLOW_VERSION=2.2.5
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
airflow standalone
Then, in there, you can see the username and password provided.
Then, open Chrome and search for:
localhost:8080
And write the username and password.
airflow has a web interface as well by default and default user pass is : airflow/airflow
you can run it by using :
airflow webserver --port 8080
then open the link : http://localhost:8080
if you want to make a new username by this command:
airflow create_user [-h] [-r ROLE] [-u USERNAME] [-e EMAIL] [-f FIRSTNAME]
[-l LASTNAME] [-p PASSWORD] [--use_random_password]
learn more about Running Airflow locally
You should install it , it is a python package not a website to register on.
The easiest way to install Airflow is:
pip install apache-airflow
if you need extra packages with it:
pip install apache-airflow[postgres,gcp]
finally run the webserver and the scheduler in different cmd :
airflow webserver # it is by default 8080
airflow scheduler

Change Airflow Services Logs Path

I am looking for resources to change the log paths for Airflow services such as Webserver and Scheduler. I am running out of space every now and then and so want to move the logs into a bigger mount space.
airflow-scheduler.log
airflow-webserver.log
airflow-scheduler.out
airflow-webserver.out
airflow-scheduler.err
airflow-webserver.err
I am starting the services using below given command:
airflow webserver -D
airflow scheduler -D
Thanking in advance!
From https://airflow.apache.org/howto/write-logs.html#writing-logs-locally
Users can specify a logs folder in airflow.cfg using the base_log_folder setting. By default, it is in the AIRFLOW_HOME directory.
You need to change the airflow.cfg for log related parameters as below:
[core]
...
# The folder where airflow should store its log files
# This path must be absolute
base_log_folder = /YOUR_MOUNTED_PATH/logs
...
[webserver]
...
# Log files for the gunicorn webserver. '-' means log to stderr.
access_logfile = /YOUR_MOUNTED_PATH/webserver-access.log"
error_logfile = /YOUR_MOUNTED_PATH/webserver-error.log"
...
Log location can be specified on airflow.cfg as follows. By default, it is under AIRFLOW_HOME
[core]
...
# The folder where airflow should store its log files
# This path must be absolute
base_log_folder = /airflow/logs
...
Please refer to this for additional information https://airflow.apache.org/howto/write-logs.html?highlight=logs
In both master (code) and the 1.10 branch (code), the locations of the following files are hardcoded unless you pass an argument to the cli:
airflow-webserver.err
airflow-webserver.out
airflow-webserver.log
airflow-scheduler.err
airflow-scheduler.out
airflow-scheduler.log
The rest of the log locations can be modified through one of the following variables:
In the [core] section:
base_log_folder
log_filename_template
log_processor_filename_template
dag_processor_manager_log_location
And in the [webserver] section:
access_logfile
error_logfile
You can supply flags to the airflow webserver -D and airflow scheduler -D commands to put all of the generated webserver and scheduler log files where you want them. Here's an example:
airflow webserver -D \
--port 8080 \
-A $AIRFLOW_HOME/logs/webserver/airflow-webserver.out \
-E $AIRFLOW_HOME/logs/webserver/airflow-webserver.err \
-l $AIRFLOW_HOME/logs/webserver/airflow-webserver.log \
--pid $AIRFLOW_HOME/logs/webserver/airflow-webserver.pid \
--stderr $AIRFLOW_HOME/logs/webserver/airflow-webserver.stderr \
--stdout $AIRFLOW_HOME/logs/webserver/airflow-webserver.stdout
and
airflow scheduler -D \
-l $AIRFLOW_HOME/logs/scheduler/airflow-scheduler.log \
--pid $AIRFLOW_HOME/logs/scheduler/airflow-scheduler.pid \
--stderr $AIRFLOW_HOME/logs/scheduler/airflow-scheduler.stderr \
--stdout $AIRFLOW_HOME/logs/scheduler/airflow-scheduler.stdout
Note: If you use these, you'll need to create the logs/webserver and logs/scheduler subfolders. This is only tested for airflow 2.1.2.

How to put seed data into SQL Server docker image?

I have a project using ASP.NET Core and SQL Server. I am trying to put everything in docker containers. For my app I need to have some initial data in the database.
I am able to use docker sql server image from microsoft (microsoft/mssql-server-linux), but it is (obviously) empty. Here is my docker-compose.yml:
version: "3"
services:
web:
build: .\MyProject
ports:
- "80:80"
depends_on:
- db
db:
image: "microsoft/mssql-server-linux"
environment:
SA_PASSWORD: "your_password1!"
ACCEPT_EULA: "Y"
I have an SQL script file that I need to run on the database to insert initial data. I found an example for mongodb, but I cannot find which tool can I use instead of mongoimport.
You can achieve this by building a custom image. I'm currently using the following solution. Somewhere in your dockerfile should be:
RUN mkdir -p /opt/scripts
COPY database.sql /opt/scripts
ENV MSSQL_SA_PASSWORD=Passw#rd
ENV ACCEPT_EULA=Y
RUN /opt/mssql/bin/sqlservr --accept-eula & sleep 30 & /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P 'Passw#rd' -d master -i /opt/scripts/database.sql
Alternatively you can wait for a certain text to be outputted, useful when working on the dockerfile setup, as it is immediate. It's less robust as it relies on some 'random' text of course:
RUN ( /opt/mssql/bin/sqlservr --accept-eula & ) | grep -q "Service Broker manager has started" \
&& /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P 'Passw#rd' -i /opt/scripts/database.sql
Don't forget to put a database.sql (with your script) next to the dockerfile, as that is copied into the image.
Roet's answer https://stackoverflow.com/a/52280924/10446284 ditn't work for me.
The trouble was with bash ampersands firing the sqlcmd too early. Not waiting for sleep 30 to finish.
Our Dockerfile now looks like this:
FROM microsoft/mssql-server-linux:2017-GA
RUN mkdir -p /opt/scripts
COPY db-seed/seed.sql /opt/scripts/
ENV MSSQL_SA_PASSWORD=Passw#rd
ENV ACCEPT_EULA=true
RUN /opt/mssql/bin/sqlservr & sleep 60; /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P 'Passw#rd' -d master -i /opt/scripts/seed.sql
Footnotes:
Now the bash command works like this
run-asych(sqlservr) & run-asynch(wait-and-run-sqlcmd)`
We chose sleep 60, because the build of the docker image happens "offline", before all the runtine evironment is set up. Those 60 seconds don't occur at container runtime anymore. Giving more time for the sqlservr command gives our teammates' machines more time to complete the docker build phase successfully.
One simple option is to just navigate to the container file system and copy the database files in, and then use a script to attach.
This
https://learn.microsoft.com/en-us/sql/linux/quickstart-install-connect-docker
has an example of using sqlcmd in a docker container although I'm not sure how you would add this to whatever build process you have

run apps using audio in a docker container

This question is inspired by Can you run GUI apps in a docker container?.
The basic idea is to run apps with audio and ui (vlc, firefox, skype, ...)
I was searching for docker containers using pulseaudio but all containers I found where using pulseaudio streaming over tcp.
(security sandboxing of the applications)
https://gist.github.com/hybris42/ce429de428e5af3a344a
https://github.com/jlund/docker-chrome-pulseaudio
https://github.com/tomparys/docker-skype-pulseaudio
In my case I would prefere playing audio from an app inside the container directly to my host pulseaudio. (without ssh tunneling and bloated docker images)
Pulseaudio because my qt app is using it ;)
it took me some time until i found out what is needed. (Ubuntu)
we start with the docker run command docker run -ti --rm myContainer sh -c "echo run something"
ALSA:
we need /dev/snd and some hardware access as it looks like.
when we put this together we have
docker run -ti --rm \
-v /dev/snd:/dev/snd \
--lxc-conf='lxc.cgroup.devices.allow = c 116:* rwm' \
myContainer sh -c "echo run something"`
In new docker versions without lxc flags you shoud use this:
docker run -ti --rm \
-v /dev/snd:/dev/snd \
--privileged \
myContainer sh -c "echo run something"`
PULSEAUDIO:
update: it may be enought to mount the pulseaudio socket within the container using -v option. this depends on your version and prefered access method. see other answers for the socket method.
Here we need basically /dev/shm, /etc/machine-id and /run/user/$uid/pulse. But that is not all (maybe because of Ubuntu and how they did it in the past). The envirorment variable XDG_RUNTIME_DIR has to be the same in the host system and in your docker container. You may also need /var/lib/dbus because some apps are accessing the machine id from here (may only containing a symbolic link to the 'real' machine id). And at least you may need the hidden home folder ~/.pulse for some temp data (i am not sure about this).
docker run -ti --rm \
-v /dev/shm:/dev/shm \
-v /etc/machine-id:/etc/machine-id \
-v /run/user/$uid/pulse:/run/user/$uid/pulse \
-v /var/lib/dbus:/var/lib/dbus \
-v ~/.pulse:/home/$dockerUsername/.pulse \
myContainer sh -c "echo run something"
In new docker versions you might need to add --privileged.
Of course you can combine both together and use it together with xServer ui forwarding like here: https://stackoverflow.com/a/28971413/2835523
Just to mention:
you can handle most of this (all without the used id) in the dockerfile
using uid=$(id -u) to get the user id and gid with id -g
creating a docker user with this id
create user script:
mkdir -p /home/$dockerUsername && \
echo "$dockerUsername:x:${uid}:${gid}:$dockerUsername,,,:/home/$dockerUsername:/bin/bash" >> /etc/passwd && \
echo "$dockerUsername:x:${uid}:" >> /etc/group && \
mkdir /etc/sudoers.d && \
echo "$dockerUsername ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/$dockerUsername && \
chmod 0440 /etc/sudoers.d/$dockerUsername && \
chown ${uid}:${gid} -R /home/$dockerUsername
Inspired by the links you've posted, I was able to create the following solution. It is as lightweight as I could get it. However, I'm not sure if it is (1) secure, and (2) entirely fits your use-case (as it still uses the network).
Install paprefson your host system, e.g. using sudo apt-get install paprefs on an Ubuntu machine.
Launch PulseAudio Preferences, go to the "Network Server" tab, and check the "Enable network access to local sound devices" checkbox [1]
Restart your computer. (Only restarting Pulseaudio didn't work for me on Ubuntu 14.10)
Install Pulseaudio in your container, e.g. sudo apt-get install -y pulseaudio
In your container, run export "PULSE_SERVER=tcp:<host IP address>:<host Pulseaudio port>". For example, export "PULSE_SERVER=tcp:172.16.86.13:4713" [2]. You can find out your IP address using ifconfig and the Pulseaudio port using pax11publish [1].
That's it. Step 5 should probably be automated if the IP address and Pulseaudio port are subject to change. Additionally, I'm not sure if Docker permanently stores environment variables like PULSE_SERVER: If it doesn't then you have to initialize it after each container start.
Suggestions to make my approach even better would be greatly appreciated, since I'm currently working on a similar problem as the OP.
References:
[1] https://github.com/jlund/docker-chrome-pulseaudio
[2] https://github.com/jlund/docker-chrome-pulseaudio/blob/master/Dockerfile
UPDATE (and probably the better solution):
This also works using a Unix socket instead of a TCP socket:
Start the container with -v /run/user/$UID/pulse/native:/path/to/pulseaudio/socket
In the container, run export "PULSE_SERVER=unix:/path/to/pulseaudio/socket"
The /path/to/pulseaudio/socket can be anything, for testing purposes I used /home/user/pulse.
Maybe it will even work with the same path as on the host (taking care of the $UID part) as the default socket, this way the ultimate solution would be -v /run/user/$UID/pulse/native:/run/user/<UID in container>/pulse; I haven't tested this however.
After trying most of the solutions described here I found only PulseAudio over network to be really working. However you can make it safe by keeping the authentication.
Install paprefs (on host machine):
$ apt-get install paprefs
Launch paprefs (PulseAudio Preferences) > Network Server > [X] Enable network access to local sound devices.
Restart PulseAudio:
$ service pulseaudio restart
Check it worked or restart machine:
$ (pax11publish || xprop -root PULSE_SERVER) | grep -Eo 'tcp:[^ ]*'
tcp:myhostname:4713
Now use that socket:
$ docker run \
-e PULSE_SERVER=tcp:$(hostname -i):4713 \
-e PULSE_COOKIE=/run/pulse/cookie \
-v ~/.config/pulse/cookie:/run/pulse/cookie \
...
Check that the user running inside the container has access to the cookie file ~/.config/pulse/cookie.
To test it works:
$ apt-get install mplayer
$ mplayer /usr/share/sounds/alsa/Front_Right.wav
For more info may check Docker Mopidy project.
Assuming pulseaudio is installed on host and in image, one can provide pulseaudio sound over tcp with only a few steps. pulseaudio does not need to be restarted, and no configuration has to be done on host or in image either. This way it is included in x11docker, without the need of VNC or SSH:
First, find a free tcp port:
read LOWERPORT UPPERPORT < /proc/sys/net/ipv4/ip_local_port_range
while : ; do
PULSE_PORT="`shuf -i $LOWERPORT-$UPPERPORT -n 1`"
ss -lpn | grep -q ":$PULSE_PORT " || break
done
Get ip adress of docker daemon. I always find it being 172.17.42.1/16
ip -4 -o a | grep docker0 | awk '{print $4}'
Load pulseaudio tcp module, authenticate connection to docker ip:
PULSE_MODULE_ID=$(pactl load-module module-native-protocol-tcp port=$PULSE_PORT auth-ip-acl=172.17.42.1/16)
On docker run, create environment variable PULSE_SERVER
docker run -e PULSE_SERVER=tcp:172.17.42.1:$PULSE_PORT yourimage
Afterwards, unload tcp module. (Note: for unknown reasons, unloading this module can stop pulseaudio daemon on host):
pactl unload-module $PULSE_MODULE_ID
Edit: How-To for ALSA and Pulseaudio in container
I managed to dockerize a Java game in the following ways, effectively passing through the game's sound.
This approach requires building an image, making sure the app has all the dependencies it'll need, in this case, pulseaudio and x11. If you're sure your images has everything it needs, you may procees as stated in the previous answers.
Here, we need to build the image, then we can actually launch it.
docker build -t my-unciv-image . # Run from directory where Dockerfile is
docker run --name unciv # image name\
--device /dev/dri \
-e DISPLAY=$DISPLAY \
-e PULSE_SERVER=unix:/run/user/1000/pulse/native \
--privileged \
-u $(id -u):$(id -g) \
-v /path/to/Unciv:/App \
-v /run/user/$(id -u)/pulse:/run/user/(id -u)/pulse \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-w /App \
my-unciv-image \
java -jar /App/Unciv.jar
In the second command the following is specified:
--name: a name is given to the container
--device: video device*
-e: required environment vars
DISPLAY: the display number
PULSE_SERVER: PulseAudio audio server socket
--privileged: run ip privileged*, so it can access all devices
-v: Mounted volumes:
Path to the game mounted into /App in the container**
Audio server socke
Display server socket
-w: Working directory
Here is a docker-compose.yml version of it:
# docker-compose.yml
version: '3'
services:
unciv:
build: .
container_name: unciv
devices:
- /dev/dri:/dev/dri # * Either this
entrypoint: java -jar /App/Unciv.jar
environment:
- DISPLAY=$DISPLAY
- PULSE_SERVER=unix:/run/user/1000/pulse/native
privileged: true # * or this
user: 1000:1000
volumes:
- /path/to/game/:/App
- /run/user/1000/pulse:/run/user/1000/pulse
- /tmp/.X11-unix:/tmp/.X11-unix
working_dir: /App
FROM ubuntu:20.04
RUN apt-get update
RUN apt-get install openjdk-11-jre -y
RUN apt-get install -y xserver-xorg-video-all
RUN apt-get install -y libgl1-mesa-glx libgl1-mesa-dri
RUN apt-get install -y pulseaudio
USER unciv
Notes:
*Only required for a game or anything that uses openGL. Either passing the devices explicitly or running it as privileged, but I think it's enough to pass the device, making it privileged may be overkill.
**This math may be bundled with the docker image, but for a demo.
For the audio, it's required to pass env variable PULSE_SERVER and mounting the pulseaudio socket

Using ENV variables in daemonized Docker running RStudio

I am able to set up a Dockerfile with default ENV variables that I can then configure when running my docker container, e.g. in a Dockerfile I have the lines:
ENV USERNAME ropensci
ENV EMAIL ropensci#github.com
RUN git config --global user.name $USERNAME
RUN git config --global user.email $EMAIL
Great. When I launch an interactive session:
docker run -it --env USERNAME="Carl" --env EMAIL=cboettig#example.com myimage /bin/bash
I can then issue the command git config --list and see that git is configured to use the values I provided on the command line instead of the defaults.
However, my Dockerfile is also configured to run an RStudio server that I can then log into in the browser when running the image in Daemon mode:
docker run -d -p 8787:8787 --env USERNAME="Carl" --env EMAIL=cboettig#example.com cboettig/ropensci-docker
I go to localhost:8787 and log in to RStudio which all works as expected, start a new "Project" with git enabled, but then RStudio cannot find my git name & email. I can open the shell from the RStudio menu and run git config --list or echo $USERNAME and I just get a blank value. Why does this work for /bin/bash but not from RStudio and how do I fix it?
Your git config is set on /.gitconfig. This config file is for root user. You need to set git config for rstudio user because rstudio run on rstudio user. Below command is a temporary solution.
docker run -it -p 8787:8787 --env USERNAME="Carl" --env EMAIL=cboettig#example.com cboettig/ropensci-docker bash -c "cp /.gitconfig /home/rstudio; /usr/bin/supervisord"
It works!
Another solution is writing Dockerfile is based on cboettig/ropensci-docker. Below is sample Dockerfile.
FROM cboettig/ropensci-docker
RUN cp /.gitconfig /home/rstudio
CMD ["/usr/bin/supervisord"]

Resources