Airflow duplicating logs - airflow

I'm using Airflow 1.10.4 and I'm facing a problem when I use logging in my custom modules.
My code be like:
import logging
log = logging.getLogger(__name__)
log.info('hello there')
And the output be like:
[2020-01-22 09:44:29,954] {{logging_mixin.py:95}} INFO - [[34m2020-01-22 09:44:29,954[0m] {{[34mbase_hook.py:[0m84}} INFO[0m - hello there.
The time, file name and logging level are duplicated in every line of my log.
How can I implement the log of Airflow to avoid this? (I used self.log in my operators and there is no problem.)

Check the log_format value in your airflow.cfg file.
my default config :
# Log format
log_format = [%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s
simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s

Since the default logging format of Airflow is [%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s, you'll always have duplicates in the log, because e.g. the datetime is once shown in the asctime and once in the message itself. So in order to fix that, you need to change the logging format of Airflow to show only the message.
If you're using docker-compose.yaml you need to set it there like this:
AIRFLOW__LOGGING__LOG_FORMAT: '%(message)s'
Otherwise you have to set it in the airflow.cfg as #AliNadi mentioned:
log_format = %%(message)s
Note: I had an additional issue - for me the message was also duplicated. The reason for that was that in the python code I was calling logging.basicConfig() and then additionally configuring the logger like this logger = logging.getLogger(__name__), logger.setLevel(logging.DEBUG) ... The solution was to remove the logging.basicConfig() and leave my custom logging configuration.
Reference: here

Related

Airflow DeprecationWarning

I'm running a distributed Airflow 2.4.0 setup using the official Docker image. All the containers use the same .env file and same version of Airflow image. When I log into one of the Airflow containers I get this warning:
/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py:545: DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
option = self._get_environment_variables(deprecated_key, deprecated_section, key, section)
/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py:545: DeprecationWarning: The auth_backend option in [api] has been renamed to auth_backends - the old setting has been used, but please update your config.
option = self._get_environment_variables(deprecated_key, deprecated_section, key, section)
/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py:367: FutureWarning: The auth_backends setting in [api] has had airflow.api.auth.backend.session added in the running config, which is needed by the UI. Please update your config before Apache Airflow 3.0.
FutureWarning,
I checked the airflow.cfg inside the container and it has the up to date variables. Why do I still get the warning messages?
You are seeing these variables because of the section. airflow.cfg is configuration file with section. settings are expected to be in the proper section.
In your case your airflow.cfg has sql_alchemy_conn where you override the default value. Prior to 2.3.0 this setting in core section and in 2.3.0 it was moved to database section. (see PR)
What you need to do is simply open airflow.cfg and move the setting to the proper section. For example:
[core]
sql_alchemy_conn = sqlite:///{AIRFLOW_HOME}/airflow.db
to:
[database]
sql_alchemy_conn = sqlite:///{AIRFLOW_HOME}/airflow.db
The reason why it's like that is also explained in the docs. Airflow reference settings by environment variables of the format AIRFLOW__{SECTION}__{KEY} so in this case it will be:
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN so the section is important to find the variable.

MariaDB default charset

Documentation for MariaDB says clearly how to set up server defaults like charset and collation. So I opened my Mariadb console and run:
SET character_set_server = 'utf8mb4';
SET collation_server = 'utf8mb4_slovak_ci';
Consoles wrote OK.
Then I restart the server, but as I tried to create new database there are still the same latin2 charset and swedish collation.
I do it automatically via Symfony cosole command
php bin/console doctrine:database:create
What is wrong with that? I did it like documentation says.
SET character_set_server changes the character set for the current connection.
SET global character_set_server changes the character set globally for each new connection.
However if you restart your server the default character sets for server will be read from the configuration file. If the configuration file doesn't specify character set, then defaults will be used. So for making your settings permanent, you have to change the character sets in your configuration file (https://mariadb.com/kb/en/configuring-mariadb-with-option-files/)
First, run the SHOW VARIABLES like "%collation%"; to show you the current configs.
To change collation_server setting, you have to use the keyword GLOBAL and therefore your statement will be SET GLOBAL collation_server = 'utf8mb4_slovak_ci';

Envrypt sql_alchemy_conn in airflow config file (ansible)

Is there a way to encrypt the airflow config file sql_alchemy_conn string , the password shown in example is plaintext . What options are there to secure it. Also if the password has special chars how it must be escaped in the config file
Trying to install airflow using airflow role.
# See: https://www.sqlalchemy.org/
sql_alchemy_conn:
value: "postgresql+psycopg2://pgclusteradm#servername:PLAINTEXTPASSWORD#server.postgres.database.azure.com/airflow2"
Way to encrypt password, couldn't find how to encrypt this.
You can provide the database URI through environment variables instead of the config file. This doesn't encrypt it or necessarily make it more secure, but it at least isn't plainly sitting in a permanent file.
In your airflow.cfg you can put a placeholder:
[core]
...
sql_alchemy_conn = override_me
...
Then set AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://... in an environment variable when you bring up Airflow components. This way of setting and overriding configuration options through environment variables is detailed in the docs, but the basic format is AIRFLOW__{SECTION}__{KEY}=<value>.
There are 2 ways of securing this as mentioned in docs:
1) Environment Variable:
You can override the setting in airflow.cfg by setting the following environment variable:
AIRFLOW__CORE__SQL_ALCHEMY_CONN=my_conn_string
This way you can keep the setting in airflow.cfg as empty so no one can view the password.
2) Get string by running command:
You can also derive the connection string at run time by appending _cmd to the key like this:
[core]
sql_alchemy_conn_cmd = bash_command_to_run

Weblogic 12C sending logs to syslog

I want to send my weblogic log to syslog. here is what I have done so far.
1.Included following log4j.properties in managed server classpath -
log4j.rootLogger=DEBUG,syslog
log4j.appender.syslog=org.apache.log4j.net.SyslogAppender
log4j.appender.syslog.Threshold=DEBUG
log4j.appender.syslog.Facility=LOCAL7
log4j.appender.syslog.FacilityPrinting=false
log4j.appender.syslog.Header=true
log4j.appender.syslog.SyslogHost=localhost
log4j.appender.syslog.layout=org.apache.log4j.PatternLayout
log4j.appender.syslog.layout.ConversionPattern=[%p] %c:%L - %m%n
2. added following command to managed server arguments -
-Dlog4j.configuration=file :<path to log4j properties file> -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Log4JLogger -Dweblogic.log.Log4jLoggingEnabled=true
3. Added wllog4j.jar and llog4j-1.2.14.jar into domain's lib folder.
4.Then, from Admin console changed logging information by doing the following. "my_domain_name"--->Configuration--->Logging--->(Advanced options)-->Logging implementation: Log4J
Restart managed server.
I used this as refernce. But didnt get anaything in syslog(/var/log/message). What am I doing wrong?
I would recommend a couple items to check:
Remove the space in DEBUG, syslog in the file
Your last two server arguments have a space between the - and the D so make sure that wasn't just a copy and paste error in this post.
Double check that the log files are in the actual classpath.
Double check from a ps command, that the -D options made it correctly into the start command that was executed.
Make sure that the managed server has a copy of the JARs correctly as they would get synchornized from admin during the restart.
Hopefully something in there will help or give an idea of what to look for.
--John
I figured out the problem. My appender was working fine, the problem was in rsyslog.conf. Just uncommented following properties
# Provides UDP syslog reception
#$ModLoad imudp
#$UDPServerRun 514
We were appending the messages, but the listner was abesnt, so it didnt knew what to do with it.
and from *.debug;mail.none;authpriv.none;cron.none /var/log/messages it figures out where to redirect any (debug in this case) information to messages file.

configuring Airflow to work with CeleryExecutor

I try to configure Airbnb AirFlow to use the CeleryExecutor like this:
I changed the executer in the airflow.cfg from SequentialExecutor to CeleryExecutor:
# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor
executor = CeleryExecutor
But I get the following error:
airflow.configuration.AirflowConfigException: error: cannot use sqlite with the CeleryExecutor
Note that the sql_alchemy_conn is configured like this:
sql_alchemy_conn = sqlite:////root/airflow/airflow.db
I looked at Airflow's GIT (https://github.com/airbnb/airflow/blob/master/airflow/configuration.py)
and found that the following code throws this exception:
def _validate(self):
if (
self.get("core", "executor") != 'SequentialExecutor' and
"sqlite" in self.get('core', 'sql_alchemy_conn')):
raise AirflowConfigException("error: cannot use sqlite with the {}".
format(self.get('core', 'executor')))
It seems from this validate method that the sql_alchemy_conn cannot contain sqlite.
Do you have any idea how to configure the CeleryExecutor without sqllite? please note that I downloaded rabitMQ for working with the CeleryExecuter as required.
It is said by AirFlow that the CeleryExecutor requires other backend than default database SQLite. You have to use MySQL or PostgreSQL, for example.
The sql_alchemy_conn in airflow.cfg must be changed to follow the SqlAlchemy connection string structure (see SqlAlchemy document)
For example,
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow#127.0.0.1:5432/airflow
To configure Airflow for mysql
firstly install mysql this might help or just google it
goto airflow installation director usually /home//airflow
edit airflow.cfg
locate
sql_alchemy_conn = sqlite:////home/vipul/airflow/airflow.db
and add # in front of it so it looks like
#sql_alchemy_conn = sqlite:////home/vipul/airflow/airflow.db
if you have default sqlite
add this line below
sql_alchemy_conn = mysql://:#localhost:3306/
save the file
run command
airflow initdb
and done !
As other answers have stated you need to use a different database besides SQLite. Additionally you need to install rabbitmq, configure it appropriately, and change each of your airflow.cfg's to have the correct rabbitmq information. For an excellent tutorial on this see A Guide On How To Build An Airflow Server/Cluster.
If you run it on a kubernetes cluster. Use the following config:
airflow:
config:
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow#airflow-postgresql:5432/airflow

Resources