"db init" with postgres for Airflow - airflow

I am referencing to this doc and this article in linking a Postgres database to Airflow.
Particularly, I added this line to the file airflow.cfg:
sql_alchemy_conn = postgresql+psycopg2://airflowadmin:airflowadmin#localhost/airflowdb
where airflowadmin is both the username and password for the postgres user and password, and airflowdb is a postgres db created, with airflowadmin having all the privileges.
Now, when I initialize the database with airflow db init, however, I still see sqlite being the linked database. Full output:
DB: sqlite:////home/userxxxx/airflow/airflow.db
[2021-09-07 12:43:53,827] {db.py:702} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
WARNI [unusual_prefix_xxxxxxxxxxxxxxxxxxxxxxxxx_example_kubernetes_executor_config] Could not import DAGs in example_kubernetes_executor_config.py: No module named 'kubernetes'
WARNI [unusual_prefix_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx_example_kubernetes_executor_config] Install kubernetes dependencies with: pip install apache-airflow['cncf.kubernetes']
Initialization done
What am I missing?

Make sure the airflow.cfg file you are changing is the same that is actually being loaded by Airflow. From the CLI run:
airflow info
Search under the Paths info section and compare it with the path of the folder with the airflow.cfg file that you are modifying.
airflow info:
ache Airflow
version | 2.1.2
executor | SequentialExecutor
task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler
sql_alchemy_conn | sqlite:////home/vagrant/airflow/airflow.db
dags_folder | /home/vagrant/airflow/dags
plugins_folder | /home/vagrant/airflow/plugins
base_log_folder | /home/vagrant/airflow/logs
remote_base_log_folder |
System info
OS
...
...
Paths info
airflow_home | /home/vagrant/airflow
...
When not defined during the local installation process, the default value of airflow_homeis AIRFLOW_HOME=~/airflow, so I guess that may be the cause of your problem.

Related

Airflow dag automatically Triggers

I have created the dag with the following configuration
job_type='daily'
SOURCE_PATH='/home/ubuntu/daily_data'
with DAG(
dag_id="transformer_daily_v1",
is_paused_upon_creation=False,
default_args=default_args,
description="transformer to insert data",
start_date=datetime(2022,9,20),
schedule_interval='31 12 * * *',
catchup=False
) as dag:
task1=PythonOperator(
task_id="dag_task_1",
python_callable=get_to_know_details(job_type,SOURCE_PATH),
)
def get_to_know_details(job_type,SOURCE_PATH):
print("************************",job_type,SOURCE_PATH)
Each time when i start the airflow using command
airflow standalone
the dag function executed automatically without Triggering as seen in the logs
standalone | Starting Airflow Standalone
standalone | Checking database is initialized
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
************************ daily /home/ubuntu/daily_data
WARNI [unusual_prefix_8fc9338bb4cf0c5518fed57dffa1a11abec44c36_example_kubernetes_executor] The example_kubernetes_executor examp
le DAG requires the kubernetes provider. Please install it with: pip install apache-airflow[cncf.kubernetes]
airflow version - 2.2.5

Apache airflow celery worker server running in dev mode on production build

I have created a production docker image using breeze command line tool provided. However when I run the airflow worker command, I get the following message on the command line.
Breeze command:
./breeze build-image --production-image --python 3.7 --additional-extras=jdbc --additional-python-deps="pandas pymysql" --additional-runtime-apt-deps="default-jre-headless"
Can anyone help on how to move the worker out of development server?
airflow-worker_1 | Starting flask
airflow-worker_1 | * Serving Flask app "airflow.utils.serve_logs" (lazy loading)
airflow-worker_1 | * Environment: production
airflow-worker_1 | WARNING: This is a development server. Do not use it in a production deployment.
airflow-worker_1 | Use a production WSGI server instead.
airflow-worker_1 | * Debug mode: off
airflow-worker_1 | [2021-02-08 21:57:58,409] {_internal.py:113} INFO - * Running on http://0.0.0.0:8793/ (Press CTRL+C to quit)
here is a discussion by airflow maintainer in github: https://github.com/apache/airflow/discussions/18519
It's harmless. It's an internal server run by the executor to share logs with the webserver. It's been already corrected in main to use 'production` setup (thought it's not REALLY) needed i this case as the log "traffic" and characteristics is not production-webserver like.
The fix will be released in Airflow 2.2 (~ month from now) .

Airflow is using SequentialExecutor despite setting executor to LocalExecutor in airflow.cfg

I'm having trouble getting the LocalExecutor to work.
I created a postgres database called airflow and granted all privileges to the airflow user. Finally I updated my airflow.cfg file:
# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor
executor = LocalExecutor
# The SqlAlchemy connection string to the metadata database.
# SqlAlchemy supports many different database engine, more information
# their website
sql_alchemy_conn = postgresql+psycopg2://airflow:[MY_PASSWORD]#localhost:5432/airflow
Next I ran:
airflow initdb
airflow scheduler
airflow webserver
I thought it was working, but I noticed my dags were taking a long time to finish. Upon further inspection of my log files, I noticed that they say Airflow is using the SequentialExecutor.
INFO - Job 319: Subtask create_task_send_email [2020-01-07 12:00:16,997] {__init__.py:51} INFO - Using executor SequentialExecutor
Does anyone know what could be causing this?

Persistence with Bitnami's Wordpress Docker Setup

I'm trying to set up Wordpress with this documentation:
https://github.com/bitnami/bitnami-docker-wordpress#mount-host-directories-as-data-volumes-with-docker-compose
My host directories for the volumes look like this in the docker-compose file:
volumes:
- './mariadb_data:/bitnami'
...
volumes:
- './wordpress_data:/bitnami'
When running docker-compose up, the following errors occur:
mariadb_1 | INFO ==> Starting mysqld_safe...
mariadb_1 | Could not open required defaults file: /opt/bitnami/mariadb/conf/my.cnf
mariadb_1 | Fatal error in defaults handling. Program aborted
mariadb_1 | WARNING: Defaults file '/opt/bitnami/mariadb/conf/my.cnf' not found!
mariadb_1 | Could not open required defaults file: /opt/bitnami/mariadb/conf/my.cnf
mariadb_1 | Fatal error in defaults handling. Program aborted
mariadb_1 | WARNING: Defaults file '/opt/bitnami/mariadb/conf/my.cnf' not found!
mariadb_1 | 171105 05:15:41 mysqld_safe Logging to '/opt/bitnami/mariadb/data/200101d1b330.err'.
mariadb_1 | 171105 05:15:41 mysqld_safe Starting mysqld daemon with databases from /opt/bitnami/mariadb/data
mariadb_1 | /opt/bitnami/mariadb/bin/mysqld_safe_helper: Can't create/write to file '/opt/bitnami/mariadb/data/200101d1b330.err' (Errcode: 2 "No such file or directory")
myproject_mariadb_1 exited with code 1
However, if I change my docker-compose file to use non-host directories:
volumes:
- 'mariadb_data:/bitnami'
...
volumes:
- 'wordpress_data:/bitnami'
... the docker-compose up works.
If I then stop docker, and then revert my docker-compose file to use host directories again, docker-compose up will now work, and the host directories are populated correctly.
This is a solution to my problem, but I would like to know why, and if there is a way to make things work without this work-around.
Check if the bitnami/bitnami-docker-mariadb issue 123 is relevant in your case:
It seems that docker-compose up did not create a container from scratch (with a clean filesystem), but instead used one preexisting. I deduce this from the beginning sequence:
Starting mariadb_mariadb_1
Attaching to mariadb_mariadb_1
...
It seems to me that this container, in its previous execution, was started with an attached volume at /bitnami/mariadb. After that, the container was stopped, such volume detached, and then the container was restarted. It didn't configure anything and just tried to run the mysql server binary. Since we perform symbolic links from /opt/bitnami/mariadb pointing to /bitnami/mariadb (my.cnf file included), that file went missing and the binaries crashed at start time.
Could you please try using the docker-compose file we provide in this repo? If you only modify it for adding environment variables you shouldn't run into these kind of issues.
As a workaround, just run the following:
docker-compose down -v
docker-compose up
It will remove the the MariaDB container, along with any volume associated, and start from scratch. Bear in mind that you will lose any state you set in the container.

Is there any CLI way to show information about Glassfish JDBC connection pool?

The only relevant command that I found is:
NAME
list-jdbc-connection-pools - lists all JDBC connection pools
EXAMPLES
This example lists the existing JDBC connection pools.
asadmin> list-jdbc-connection-pools
sample_derby_pool
__TimerPool
Command list-jdbc-connection-pools executed successfully.
What I want is to display the information about particular connection pool. Such as:
asadmin desc-jdbc-connection-pool sample_derby_pool
name: sample_derby_pool
databaseName: oracle
portNumber: 1521
serverName: test
user: testUser
...
Try running:
asadmin get * | more
The above command will display all GlassFish attributes. Pipe it to grep to get just the pool properties you are interested in:
asadmin get * | grep TimerPool
Hope this helps.

Resources