Airflow secure-file-priv option - cannot insert data from file to database - airflow

I have a DAG of three tasks. The first one checks if csv file exists, the second creates MySQL table and the third one inserts the data from csv file into MySQL table. First two tasks succeed but the last one not saying that due to secure-file-priv option I have no rigts to insert data from file to database.
default_args = {
"owner": "airflow",
"start_date": airflow.utils.dates.days_ago(1),
"retries": 1,
"retry_delay": timedelta(seconds=5)
}
with DAG('sql_operator_from_csv_to_mysql',default_args=default_args,schedule_interval='#daily', template_searchpath=['/opt/airflow/sql_files_mysql'], catchup=True) as dag:
t1 = BashOperator(task_id='check_file_exists', bash_command='ls /opt/airflow/store_files/students_data.csv | sha1sum', retries=2, retry_delay=timedelta(seconds=15))
t2 = MySqlOperator(task_id='create_mysql_table', mysql_conn_id="mysql_conn", sql="create_table.sql")
t3 = MySqlOperator(task_id='insert_into_table', mysql_conn_id="mysql_conn", sql="insert_into_table.sql")
SQL statement:
LOAD DATA INFILE '/store_files/students_data.csv' INTO TABLE students_db FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS;
mysql.cnf file is as following.
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mysql.conf.d/
[mysqld]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /var/lib/mysql
secure-file-priv= ""
I have tried to mark 'secure-file-priv' option as an empty string and showing the location of a csv file but none of them are working.
secure-file-priv= ""
secure-file-priv= "/opt/airflow/store_files"
Error message saying:
[2022-06-12, 09:44:24 UTC] {taskinstance.py:1774} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/mysql/operators/mysql.py", line 84, in execute
hook.run(self.sql, autocommit=self.autocommit, parameters=self.parameters)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/hooks/dbapi.py", line 205, in run
self._run_command(cur, sql_statement, parameters)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/hooks/dbapi.py", line 229, in _run_command
cur.execute(sql_statement)
File "/home/airflow/.local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/home/airflow/.local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/home/airflow/.local/lib/python3.7/site-packages/MySQLdb/connections.py", line 254, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1290, 'The MySQL server is running with the --secure-file-priv option so it cannot execute this statement')
[2022-06-12, 09:44:24 UTC] {taskinstance.py:1288} INFO - Marking task as FAILED. dag_id=sql_operator_from_csv_to_mysql, task_id=insert_into_table, execution_date=20220612T094413, start_date=20220612T094424, end_date=20220612T094424
[2022-06-12, 09:44:24 UTC] {standard_task_runner.py:98} ERROR - Failed to execute job 10 for task insert_into_table ((1290, 'The MySQL server is running with the --secure-file-priv option so it cannot execute this statement'); 1190)
[2022-06-12, 09:44:24 UTC] {local_task_job.py:154} INFO - Task exited with return code 1
[2022-06-12, 09:44:24 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
Any ideas to solve this problem?

I have tried to make some minor modifications in the docker-compose.yaml file.
My docker-compose.yaml file is as below
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
#
# WARNING: This configuration is for local development. Do not use it in a production deployment.
#
# This configuration supports basic configuration using environment variables or an .env file
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow.
# Default: apache/airflow:2.3.2
# AIRFLOW_UID - User ID in Airflow containers
# Default: 50000
# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
#
# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if requested).
# Default: airflow
# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested).
# Default: airflow
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
# Default: ''
#
# Feel free to modify this file to suit your needs.
---
version: '3'
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
# and uncomment the "build" line below, Then run `docker-compose build` to build the images.
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.3.2}
# build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
# For backward compatibility, with Airflow <2.3
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
volumes:
- ./dags:/usr/local/airflow/dags
- ./store_files:/usr/local/airflow/store_files_airflow
- ./sql_files_mysql:/usr/local/airflow/sql_files
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
expose:
- 6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
mysql:
image: mysql:latest
environment:
- MYSQL_ROOT_PASSWORD=airflow
volumes:
- ./store_files:/var/lib/mysql-files
- ./mysql.cnf:/etc/mysql/mysql.cnf
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
# Required to handle warm shutdown of the celery workers properly
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-triggerer:
<<: *airflow-common
command: triggerer
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
# yamllint disable rule:line-length
command:
- -c
- |
function ver() {
printf "%04d%04d%04d%04d" $${1//./ }
}
airflow_version=$$(gosu airflow airflow version)
airflow_version_comparable=$$(ver $${airflow_version})
min_airflow_version=2.2.0
min_airflow_version_comparable=$$(ver $${min_airflow_version})
if (( airflow_version_comparable < min_airflow_version_comparable )); then
echo
echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
echo
exit 1
fi
if [[ -z "${AIRFLOW_UID}" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
echo "If you are on Linux, you SHOULD follow the instructions below to set "
echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
echo "For other operating systems you can get rid of the warning with manually created .env file:"
echo " See: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#setting-the-right-airflow-user"
echo
fi
one_meg=1048576
mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
disk_available=$$(df / | tail -1 | awk '{print $$4}')
warning_resources="false"
if (( mem_available < 4000 )) ; then
echo
echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
echo
warning_resources="true"
fi
if (( cpus_available < 2 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
echo "At least 2 CPUs recommended. You have $${cpus_available}"
echo
warning_resources="true"
fi
if (( disk_available < one_meg * 10 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
echo
warning_resources="true"
fi
if [[ $${warning_resources} == "true" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
echo "Please follow the instructions to increase amount of resources available:"
echo " https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#before-you-begin"
echo
fi
mkdir -p /sources/logs /sources/dags /sources/plugins
chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
exec /entrypoint airflow version
# yamllint enable rule:line-length
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
_PIP_ADDITIONAL_REQUIREMENTS: ''
user: "0:0"
volumes:
- .:/sources
airflow-cli:
<<: *airflow-common
profiles:
- debug
environment:
<<: *airflow-common-env
CONNECTION_CHECK_MAX_COUNT: "0"
# Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
command:
- bash
- -c
- airflow
# You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
# or by explicitly targeted on the command line e.g. docker-compose up flower.
# See: https://docs.docker.com/compose/profiles/
flower:
<<: *airflow-common
command: celery flower
profiles:
- flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
volumes:
postgres-db-volume:
And the sql.conf file is as below
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mysql.conf.d/
[mysqld]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /var/lib/mysql
secure-file-priv= ""

Related

Problem when a try to run a dag ("NameError: name 'args' is not defined" and "No viable dags retrieved" error)

I'm trying to run an airflow example that runs a Spark job.
Here is the link of the tutorial https://www.projectpro.io/recipes/use-sparksubmitoperator-airflow-dag#mcetoc_1g2jkipvff
Unfortunately I get an error that I don't understand.
In my code I have in the dags folder the 3 following files:
the sparkoperator_demo.py file :
import airflow
from datetime import timedelta
from airflow import DAG
from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator
from airflow.utils.dates import days_ago
default_args = {
'owner': 'airflow',
#'start_date': airflow.utils.dates.days_ago(2),
# 'end_date': datetime(),
# 'depends_on_past': False,
# 'email': ['airflow#example.com'],
# 'email_on_failure': False,
#'email_on_retry': False,
# If a task fails, retry it once after waiting
# at least 5 minutes
#'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag_spark = DAG(
dag_id = "sparkoperator_demo",
default_args=args,
# schedule_interval='0 0 * * *',
schedule_interval='#once',
dagrun_timeout=timedelta(minutes=60),
description='use case of sparkoperator in airflow',
start_date = airflow.utils.dates.days_ago(1)
)
spark_submit_local = SparkSubmitOperator(
application ='/home/hduser/basicsparksubmit.py' ,
conn_id= 'spark_local',
task_id='spark_submit_task',
dag=dag_spark
)
if __name__ == "__main__":
dag_spark.cli()
the sparksubmit_basic.py file :
from pyspark import SparkContext
logFilepath = "count.txt"
sc = SparkContext("local", "first app")
logData = sc.textFile(logFilepath).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
The count.txt file :
test count world of this file
Then, I have this docker-compose file :
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
#
# WARNING: This configuration is for local development. Do not use it in a production deployment.
#
# This configuration supports basic configuration using environment variables or an .env file
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow.
# Default: apache/airflow:2.5.0
# AIRFLOW_UID - User ID in Airflow containers
# Default: 50000
# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
#
# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if requested).
# Default: airflow
# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested).
# Default: airflow
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
# Default: ''
#
# Feel free to modify this file to suit your needs.
---
version: '3'
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
# and uncomment the "build" line below, Then run `docker-compose build` to build the images.
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.5.0}
# build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
# For backward compatibility, with Airflow <2.3
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- apache-airflow-providers-apache-spark apache-airflow-providers-cncf-kubernetes}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
expose:
- 6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
# Required to handle warm shutdown of the celery workers properly
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-triggerer:
<<: *airflow-common
command: triggerer
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
# yamllint disable rule:line-length
command:
- -c
- |
function ver() {
printf "%04d%04d%04d%04d" $${1//./ }
}
airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version)
airflow_version_comparable=$$(ver $${airflow_version})
min_airflow_version=2.2.0
min_airflow_version_comparable=$$(ver $${min_airflow_version})
if (( airflow_version_comparable < min_airflow_version_comparable )); then
echo
echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
echo
exit 1
fi
if [[ -z "${AIRFLOW_UID}" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
echo "If you are on Linux, you SHOULD follow the instructions below to set "
echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
echo "For other operating systems you can get rid of the warning with manually created .env file:"
echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
echo
fi
one_meg=1048576
mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
disk_available=$$(df / | tail -1 | awk '{print $$4}')
warning_resources="false"
if (( mem_available < 4000 )) ; then
echo
echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
echo
warning_resources="true"
fi
if (( cpus_available < 2 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
echo "At least 2 CPUs recommended. You have $${cpus_available}"
echo
warning_resources="true"
fi
if (( disk_available < one_meg * 10 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
echo
warning_resources="true"
fi
if [[ $${warning_resources} == "true" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
echo "Please follow the instructions to increase amount of resources available:"
echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin"
echo
fi
mkdir -p /sources/logs /sources/dags /sources/plugins
chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
exec /entrypoint airflow version
# yamllint enable rule:line-length
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
_PIP_ADDITIONAL_REQUIREMENTS: ''
user: "0:0"
volumes:
- .:/sources
airflow-cli:
<<: *airflow-common
profiles:
- debug
environment:
<<: *airflow-common-env
CONNECTION_CHECK_MAX_COUNT: "0"
# Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
command:
- bash
- -c
- airflow
# You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
# or by explicitly targeted on the command line e.g. docker-compose up flower.
# See: https://docs.docker.com/compose/profiles/
flower:
<<: *airflow-common
command: celery flower
profiles:
- flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
volumes:
postgres-db-volume:
My problem is that when I go to the localhost:8080 I got this error message :
Broken DAG: [/opt/airflow/dags/sparkoperator_demo.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/dags/sparkoperator_demo.py", line 24, in <module>
default_args=args,
NameError: name 'args' is not defined
I also noted that in the logs folder I got the following log :
[2023-01-02T13:03:40.715+0000] {processor.py:153} INFO - Started process (PID=75) to work on /opt/airflow/dags/sparkoperator_demo.py
[2023-01-02T13:03:40.760+0000] {processor.py:743} INFO - Processing file /opt/airflow/dags/sparkoperator_demo.py for tasks to queue
[2023-01-02T13:03:40.778+0000] {logging_mixin.py:137} INFO - [2023-01-02T13:03:40.777+0000] {dagbag.py:538} INFO - Filling up the DagBag from /opt/airflow/dags/sparkoperator_demo.py
[2023-01-02T13:03:40.976+0000] {logging_mixin.py:137} INFO - [2023-01-02T13:03:40.937+0000] {dagbag.py:343} ERROR - Failed to import: /opt/airflow/dags/sparkoperator_demo.py
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 339, in parse
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/dags/sparkoperator_demo.py", line 24, in <module>
default_args=args,
NameError: name 'args' is not defined
[2023-01-02T13:03:40.984+0000] {processor.py:755} WARNING - No viable dags retrieved from /opt/airflow/dags/sparkoperator_demo.py
[2023-01-02T13:03:41.357+0000] {processor.py:175} INFO - Processing /opt/airflow/dags/sparkoperator_demo.py took 0.690 seconds
Can you help me please ?
Actually the variable args is not defined, just replace it by default_args which is the name of your args dict:
dag_spark = DAG(
dag_id = "sparkoperator_demo",
default_args=default_args,
# schedule_interval='0 0 * * *',
schedule_interval='#once',
dagrun_timeout=timedelta(minutes=60),
description='use case of sparkoperator in airflow',
start_date = airflow.utils.dates.days_ago(1)
)

How do I combine two commands in one task? | Ansible

So, my problem is that I want to check if nginx is installed on two different OS with different package managers.
- name: Veryfying nginx installation # RedHat
command: "rpm -q nginx"
when: ansible_facts.pkg_mgr in ["yum","dnf","rpm"] #or (ansible_os_family == "RedHat")
- name: Veryfying nginx installation # Debian
command: "dpkg -l nginx"
when: ansible_facts.pkg_mgr in ["dpkg", "apt"] #or (ansible_os_family == "Debian")
Can I combine it in one task and how if it is possible? Because I need to register the output result and then use it onwards. Can't figure it out.
An alternative solution is to use the package_facts module, like this:
- hosts: localhost
tasks:
- package_facts:
- debug:
msg: "Nginx is installed!"
when: "'nginx' in packages"
But you could also register individual variables for your two tasks, and then combine the result:
- hosts: localhost
tasks:
- name: Veryfying nginx installation # RedHat
command: "rpm -q nginx"
when: ansible_facts.pkg_mgr in ["yum","dnf","rpm"] #or (ansible_os_family == "RedHat")
failed_when: false
register: rpm_check
- name: Veryfying nginx installation # Debian
command: "dpkg -l nginx"
when: ansible_facts.pkg_mgr in ["dpkg", "apt"] #or (ansible_os_family == "Debian")
failed_when: false
register: dpkg_check
- set_fact:
nginx_result: >-
{{
(rpm_check is not skipped and rpm_check.rc == 0) or
(dpkg_check is not skipped and dpkg_check.rc == 0)
}}
- debug:
msg: "nginx is installed"
when: nginx_result

I am writng the gitlab pipeline to deploy terraform files in nexus repo, I want to make it incremental versioning of my folder. I am confused help me

variables:
TF_ROOT: ${CI_PROJECT_DIR}
TF_CLI_CONFIG_FILE: $CI_PROJECT_DIR/.terraformrc
TF_IN_AUTOMATION: "true"
ARM_SUBSCRIPTION_ID: ""
ARM_TENANT_ID: ""
NEXUS_URL: ""
cache:
key: "${TF_ROOT}"
paths:
- ${TF_ROOT}/.terraform/
.terraform-setup-and-init: &init
ls -al
source <(curl -O -k $NEXUS_URL)
ls -al
chmod +x setup-terraform.sh
ls -al *.sh
source ./setup-terraform.sh
export HTTPS_PROXY=
terraform init -var-file="./environments/US/sev.tfvars"
.validate:
stage: validate
script:
- *init
- terraform validate
.build:
stage: build
script:
- *init
- echo "executing terraform plan, needs provider credentials... Skipping"
# - terraform plan -out="plan.cache"
# - terraform show -json "plan.cache" > plan.json
artifacts:
paths:
- ${TF_ROOT}/plan.cache
reports:
terraform: ${TF_ROOT}/plan.json
.deploy:
stage: deploy
script:
- *init
# - terraform apply -input=false "plan.cache"
only:
variables:
- $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
stages:
validate
build
deploy
tf validate:
extends: .validate
tf plan:
extends: .build
tf apply:
extends: .deploy
dependencies:
- tf plan

Openstack: Packer + Cloud-Init

I want to create a customized openstack OpenSUSE15-image that contains some custom software and a graphical interface. I have used an existing OpenSUSE15.0 image and packer to build that image. It works fine. The packer json file is as follows:
"builders": [
{
"type" : "openstack",
"ssh_username" : "root",
"image_name": "OpenSUSE_15_custom_kde",
"source_image": "OpenSUSE 15",
"flavor": "m1.medium",
"networks": "public-network"
}
],
"provisioners":[
{
"type": "shell",
"inline": [
"sleep 10",
"sudo -s",
"zypper --gpg-auto-import-keys refresh",
"zypper -n up -y",
"zypper -n clean -a",
"zypper -n addrepo -f http://download.opensuse.org/repositories/devel\\:/languages\\:/R\\:/patched/openSUSE_Leap_15.0/ R-patched",
"zypper -n addrepo -f http://download.opensuse.org/repositories/devel\\:/languages\\:/R\\:/released/openSUSE_Leap_15.0/ R-released",
"zypper --gpg-auto-import-keys refresh",
"zypper -n install -y R-base R-base-devel R-recommended-packages rstudio",
"zypper -n clean -a",
"zypper --non-interactive install -y -t pattern kde kde_plasma devel_kernel devel_python3 devel_C_C++ office x11",
"zypper -n install xrdp",
"zypper -n clean -a",
"zypper -n dup -y",
"systemctl enable xrdp",
"systemctl start xrdp",
"cloud-init clean --logs",
"zypper -n install -y cloud-init growpart yast2-network yast2-services-manager acpid",
"cat /dev/null > /etc/udev/rules.d/70-persistent-net.rules",
"systemctl disable cloud-init.service cloud-final.service cloud-init-local.service cloud-config.service",
"systemctl enable cloud-init.service cloud-final.service cloud-init-local.service cloud-config.service sshd",
"sudo systemctl stop firewalld",
"sudo systemctl disable firewalld",
"sed -i 's/GRUB_TIMEOUT=.*$/GRUB_TIMEOUT=0/g' /etc/default/grub",
"exec grub2-mkconfig -o /boot/grub2/grub.cfg '$#'",
"systemctl restart cloud-init",
"systemctl daemon-reload",
"cat /dev/null > ~/.bash_history && history -c && sudo su",
"cat /dev/null > /var/log/wtmp",
"cat /dev/null > /var/log/btmp",
"cat /dev/null > /var/log/lastlog",
"cat /dev/null > /var/run/utmp",
"cat /dev/null > /var/log/auth.log",
"cat /dev/null > /var/log/kern.log",
"cat /dev/null > ~/.bash_history && history -c",
"rm ~/.ssh/authorized_keys"
]
},
{
"type": "file",
"source": "./cloud_init/cloud.cfg",
"destination": "/etc/cloud/cloud.cfg"
}
]
}
There are no errors in the building and provisioning phases with packer.
In a second stage, when this base image is spawned through a heat template via the openstack client, I want some personalized tasks to be completed. User creation, granting ssh-access (including adjusting the sshd_config file...). This is done through the init_image.sh file.
#!/bin/bash
useradd -m $USERNAME -p $PASSWD -s /bin/bash
usermod -a -G sudo $USERNAME
tee /etc/ssh/banner <<EOF
You are one lucky user, if you bear the key...
EOF
tee /etc/ssh/sshd_config <<EOF
## SOME IMPORTANT SSHD CONFIGURATIONS
EOF
sudo -u $USERNAME -H sh -c 'cd ~;mkdir ~/.ssh/;echo "$SSHPUBKEY" > ~/.ssh/authorized_keys;chmod -R 700 ~/.ssh/;chmod 600 ~/.ssh/authorized_keys;'
systemctl restart sshd.service
voldata_dev="/dev/disk/by-id/virtio-$(echo $VOLDATA | cut -c -20)"
mkfs.ext4 $voldata_dev
mkdir -pv /home/$USERNAME/share
echo "$voldata_dev /home/$USERNAME/share ext4 defaults 1 2" >> /etc/fstab
mount /home/$USERNAME/share
chown -R $USERNAME:users /home/$USERNAME/share/
systemctl enable xrdp
systemctl start xrdp
For this purpose, I have created the following heat template.
heat_template_version: "2018-08-31"
description: "version 2017-09-01 created by HOT Generator at Fri, 05 Jul 2019 12:56:22 GMT."
parameters:
username:
type: string
label: User Name
description: This is the user name, and will be also the name of the key and the server
default: test
imagename:
type: string
label: Image Name
description: This is the Name of the Image e.g. Ubuntu 18.04
default: "OpenSUSE Leap 15"
ssh_pub_key:
type: string
label: ssh public key
flavorname:
type: string
label: Flavor Name
description: This is the Name of the Flavor e.g. m1.small
default: "m1.small"
vol_size:
type: number
label: Volume Size
description: This is the size of the volume that should be attached in GB
default: 10
password:
type: string
label: password
description: This is the su password and user password
resources:
init:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config:
str_replace:
template:
{get_file: init_image.sh}
params:
$USERNAME: {get_param: username}
$SSHPUBKEY: {get_param: ssh_pub_key}
$PASSWD: {get_param: password}
$VOLDATA: {get_resource: volume}
my_key:
type: "OS::Nova::KeyPair"
properties:
name:
list_join:
["_", [ {get_param: username}, 'key']]
public_key: {get_param: ssh_pub_key}
my_server:
type: "OS::Nova::Server"
properties:
block_device_mapping_v2: [{ device_name: "vda", image : { get_param : imagename }, delete_on_termination : "false", volume_size: 20 }]
name: {get_param: username}
flavor: {get_param: flavorname}
key_name: {get_resource: my_key}
admin_pass: {get_param: password}
user_data_format: RAW
user_data: {get_resource: init}
networks:
- network: "public-network"
depends_on:
- my_key
- init
- volume
volume:
type: "OS::Cinder::Volume"
properties:
# Size is given in GB
size: {get_param: vol_size}
name:
list_join: ["-", ["vol_",{get_param: username }]]
volume_attachment:
type: "OS::Cinder::VolumeAttachment"
properties:
volume_id: { get_resource: volume }
instance_uuid: { get_resource: my_server }
depends_on:
- volume
outputs:
instance_ip:
description: The IP address of the deployed instances
value: { get_attr: [my_server, first_address] }
If I use the original image in the template I have no problems (however, the building process takes very very long) and I need to restart to have the graphical KDE interface.
However, if I use the image build with packer, my user_data are ignored? I cannot log in, the user personalized user is not created... What have I missed? Why does it not work? As you see, I clean cloud-init, restart the services... I am stuck big time...
UPDATE
Here is the accesible boot-log from the machine.
UPDATE 2
This is the output of cloud-init analyze show:
-- Boot Record 01 --
The total time elapsed since completing an event is printed after the "#" character.
The time the event takes is printed after the "+" character.
Starting stage: init-local
|`->no cache found #00.01000s +00.00000s
|`->no local data found from DataSourceOpenStackLocal #00.04700s +15.23000s
Finished stage: (init-local) 15.31200 seconds
Starting stage: init-network
|`->no cache found #16.01000s +00.00100s
|`->no network data found from DataSourceOpenStack #16.01700s +00.02600s
|`->found network data from DataSourceNone #16.04300s +00.00100s
|`->setting up datasource #16.09000s +00.00000s
|`->reading and applying user-data #16.10000s +00.00200s
|`->reading and applying vendor-data #16.10200s +00.00000s
|`->activating datasource #16.12100s +00.00100s
|`->config-migrator ran successfully #16.17900s +00.00100s
|`->config-seed_random ran successfully #16.18000s +00.00100s
|`->config-bootcmd ran successfully #16.18200s +00.00000s
|`->config-write-files ran successfully #16.18200s +00.00100s
|`->config-growpart ran successfully #16.18300s +00.46100s
|`->config-resizefs ran successfully #16.64500s +01.33400s
|`->config-disk_setup ran successfully #17.98100s +00.00300s
|`->config-mounts ran successfully #17.98500s +00.00400s
|`->config-set_hostname ran successfully #17.99000s +00.09800s
|`->config-update_hostname ran successfully #18.08900s +00.01000s
|`->config-update_etc_hosts ran successfully #18.10000s +00.00100s
|`->config-rsyslog ran successfully #18.10100s +00.00200s
|`->config-users-groups ran successfully #18.10400s +00.00200s
|`->config-ssh ran successfully #18.10700s +00.61400s
Finished stage: (init-network) 02.73600 seconds
Starting stage: modules-config
|`->config-locale ran successfully #35.00200s +00.00400s
|`->config-set-passwords ran successfully #35.00600s +00.00100s
|`->config-zypper-add-repo ran successfully #35.00700s +00.00200s
|`->config-ntp ran successfully #35.01000s +00.00100s
|`->config-timezone ran successfully #35.01100s +00.00200s
|`->config-disable-ec2-metadata ran successfully #35.01300s +00.00100s
|`->config-runcmd ran successfully #35.01800s +00.00200s
Finished stage: (modules-config) 00.05100 seconds
Starting stage: modules-final
|`->config-package-update-upgrade-install ran successfully #35.87400s +00.00000s
|`->config-puppet ran successfully #35.87500s +00.00000s
|`->config-chef ran successfully #35.87600s +00.00000s
|`->config-mcollective ran successfully #35.87600s +00.00100s
|`->config-salt-minion ran successfully #35.87700s +00.00100s
|`->config-rightscale_userdata ran successfully #35.87800s +00.00100s
|`->config-scripts-vendor ran successfully #35.87900s +00.00500s
|`->config-scripts-per-once ran successfully #35.88400s +00.00100s
|`->config-scripts-per-boot ran successfully #35.88500s +00.00000s
|`->config-scripts-per-instance ran successfully #35.88500s +00.00100s
|`->config-scripts-user ran successfully #35.88600s +00.00100s
|`->config-ssh-authkey-fingerprints ran successfully #35.88700s +00.00100s
|`->config-keys-to-console ran successfully #35.88800s +00.09000s
|`->config-phone-home ran successfully #35.97900s +00.00100s
|`->config-final-message ran successfully #35.98000s +00.00600s
|`->config-power-state-change ran successfully #35.98700s +00.00100s
Finished stage: (modules-final) 00.13600 seconds
Total Time: 18.23500 seconds
1 boot records analyzed
Update 3
Apparently, when one does not update with zypper up, cloud-init behaves well and finds the user data. Hence, I will not update the image in provisioning. However, once provisioned it makes sense to update.
In the end of your provisioning you should stop cloud-init and wipe the state. Otherwise when the image is launched cloud-init think it already executed the first launch.
systemctl stop cloud-init
rm -rf /var/lib/cloud/

Saltstack: ignoring result of cmd.run

I am trying to invoke a command on provisioning via Saltstack. If command fails then I get state failing and I don't want that (retcode of command doesn't matter).
Currently I have the following workaround:
Run something:
cmd.run:
- name: command_which_can_fail || true
is there any way to make such state ignore retcode using salt features? or maybe I can exclude this state from logs?
Use check_cmd :
fails:
cmd.run:
- name: /bin/false
succeeds:
cmd.run:
- name: /bin/false
- check_cmd:
- /bin/true
Output:
local:
----------
ID: fails
Function: cmd.run
Name: /bin/false
Result: False
Comment: Command "/bin/false" run
Started: 16:04:40.189840
Duration: 7.347 ms
Changes:
----------
pid:
4021
retcode:
1
stderr:
stdout:
----------
ID: succeeds
Function: cmd.run
Name: /bin/false
Result: True
Comment: check_cmd determined the state succeeded
Started: 16:04:40.197672
Duration: 13.293 ms
Changes:
----------
pid:
4022
retcode:
1
stderr:
stdout:
Summary
------------
Succeeded: 1 (changed=2)
Failed: 1
------------
Total states run: 2
If you don't care what the result of the command is, you can use:
Run something:
cmd.run:
- name: command_which_can_fail; exit 0
This was tested in Salt 2017.7.0 but would probably work in earlier versions.

Resources