How to resolve Airflow could not allocate space for object - airflow

Airflow web page shows:
"The scheduler does not appear to be running. Last heartbeat was received 6 hours ago.
The DAGs list may not update, and new tasks will not be scheduled"
Airflow is inoperable. It appears I ran out of disk space. I've manually cleared log folder and now have disk space. When I run "airflow scheduler" I get error messages below. I do not know how to resolve.
airflow scheduler
[2023-02-10 21:10:54,079] {cli_action_loggers.py:105} WARNING - Failed to log action with (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Could not allocate space for object 'dbo.log'.'PK__log__3213E83F7F1F073F' in database 'airflow' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup. (1105) (SQLExecDirectW)")
[SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) OUTPUT inserted.id VALUES (?, ?, ?, ?, ?, ?, ?)]
[parameters: (datetime.datetime(2023, 2, 10, 21, 10, 54, 51696, tzinfo=Timezone('UTC')), None, None, 'cli_scheduler', None, 'root', '{"host_name": "plappnx-1", "full_command": "[\'/usr/local/bin/airflow\', \'scheduler\']"}')]
(Background on this error at: http://sqlalche.me/e/14/f405)
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Could not allocate space for object 'dbo.job'.'PK__job__3213E83F7D216A15' in database 'airflow' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup. (1105) (SQLExecDirectW)")
[SQL: INSERT INTO job (dag_id, state, job_type, start_date, end_date, latest_heartbeat, executor_class, hostname, unixname) OUTPUT inserted.id VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (None, <TaskInstanceState.RUNNING: 'running'>, 'SchedulerJob', datetime.datetime(2023, 2, 10, 21, 10, 54, 981528, tzinfo=Timezone('UTC')), None, datetime.datetime(2023, 2, 10, 21, 10, 54, 981540, tzinfo=Timezone('UTC')), 'SequentialExecutor', 'plappnx-1', 'root')]

The problem is not related to Airflow, neither the disk space, it's a DB problem, where you have added MAXSIZE when you created your DB, and the DB log (not Airflow log) has already reached this limit.
You can delete some of the DB log files to unblock you Airflow workload, but you need a persistent solution like increasing the MAXSIZE or setting it to unlimited.
Here is a blog which explain the problem and propose some solutions.

Related

Tokens SDK: Problems with Confidential Tokens and Observers

Setup:
Corda: 4.6
Tokens SDK: 1.2.2
Problem:
When issuing/moving Confidential Fungible and Non-Fungible tokens using Flows:
ConfidentialIssueTokens()
ConfidentialMoveFungibleTokens()
ConfidentialMoveNonFungibleTokens()
If an Observer is included an error will occur.
When testing with a MockNetwork the following error is reported:
[ERROR] 13:45:18 [Mock network] SqlExceptionHelper. - NULL not allowed
for column "HOLDER"; SQL statement: insert into non_fungible_token
(holder, issuer, token_class, token_identifier, output_index,
transaction_id) values (?, ?, ?, ?, ?, ?) [23502-199]
When running nodes locally using Cordform the following error appears in the Observer's log:
Caused by: org.h2.jdbc.JdbcSQLIntegrityConstraintViolationException:
NULL not allowed for column "HOLDER"; SQL statement: insert into
fungible_token (amount, holder, issuer, holding_key, token_class,
token_identifier, output_index, transaction_id) values (?, ?, ?, ?, ?,
?, ?, ?) [23502-199]
The Observer will not receive the state and a Flow will be entered in their Flow Hospital. Otherwise the transaction seems to be successful. The tokens will be successfully issued/moved to the appropriate Party's vaults.

Multiple airflow schedulers

I am trying to install three node airflow cluster. Each node has airflow scheduler, airflow worker, airflow webserver, also it has celery, RabbitMQ cluster and Postgres multi master cluster(implemented with Bucardo). Versions of software:
Airflow 2.0.1
Postregsql 13.2
Ubuntu 20.04
pyhton 3.8.5
celery 4.4.7
bucardo 5.6.0
RabbitMQ 3.8.2
And I occur the problem starting airflow scheduler.
When I launch the first one(database is empty), it successfully starts.
But then when I'm launching another scheduler on another machine(I tried to launch on the same machine too), it fails with the following:
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "job_pkey"
DETAIL: Key (id)=(25) already exists.
[SQL: INSERT INTO job (dag_id, state, job_type, start_date, end_date, latest_heartbeat, executor_class, hostname, unixname) VALUES (%(dag_id)s, %(state)s, %(job_type)s, %(start_date)s, %(end_date)s, %(latest_heartbeat)s, %(executor_class)s, %(hostname)s, %(unixname)s) RETURNING job.id]
[parameters: {'dag_id': None, 'state': 'running', 'job_type': 'SchedulerJob', 'start_date': datetime.datetime(2021, 4, 21, 7, 39, 20, 429478, tzinfo=Timezone('UTC')), 'end_date': None, 'latest_heartbeat': datetime.datetime(2021, 4, 21, 7, 39, 20, 429504, tzinfo=Timezone('UTC')), 'executor_class': 'CeleryExecutor', 'hostname': 'hostname', 'unixname': 'root'}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
After trying to launch a few times eventually scheduler is working. I am assuming id is incremented and then data is successfully added into database:
airflow=> select * from job order by state;
id | dag_id | state | job_type | start_date | end_date | latest_heartbeat | executor_class | hostname | unixname
----+--------+---------+--------------+-------------------------------+-------------------------------+-------------------------------+----------------+------------------------------+----------
26 | | running | SchedulerJob | 2021-04-21 07:39:22.243721+00 | | 2021-04-21 07:39:22.243734+00 | CeleryExecutor | machine name | root
25 | | running | SchedulerJob | 2021-04-21 07:39:14.515009+00 | | 2021-04-21 07:39:19.632811+00 | CeleryExecutor | machine name | root
There is a warning with log tables as well(If the second and subsequent schedulers successfully started):
WARNING - Failed to log action with (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "log_pkey"
DETAIL: Key (id)=(40) already exists.
I understand why scheduler cannot insert data into table, but how should it work correctly, how to launch multiple schedulers? Official documentation tells no additional configuration required. Hope I explained very clear. Thanks!
Looks like there is a race condition between the Airflow Schedulers and Bucardo.
Probably the easiest way to fix it is to query all servers sequentially with a connection string like this in your airflow.cfg (the same on all nodes):
[core]
sql_alchemy_conn=postgresql://USER:PASS#/DB?host=node1:port1&host=node2B&host=node3
For this to work you'll need sqlalchemy >= 1.3
Why this happens
There is a race condition between your schedulers and bucardo trying to read and write data from the table in different hosts. Changes does not propagate as quickly as they should and server writes to the table fail.
Even if you treat all your nodes as "multimaster", making all nodes look first at the same server will remediate this problem. In case of failure, they will use the second one.
I asked Airflow developers. The problem is in Bucardo since it does not support
'SELECT ... FOR UPDATE' :
I suspect that the problem is with Bucardo, which does not support record locking properly. We have high expectations, because it is a key protection mechanism against running the same task by many schedulers.
http://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#database-requirements
If that doesn't work you will have problems with duplicate keys.
Thanks!

apache airflow 1.10.9 statsd enabled making scheduler crashed

my airflow running in CeleryExecutor mode + progresql 12, all things go well except when turning statsd on:
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
The schedulers can render jobs but jobs are not running, the scheduler log having below error:
[SQL: SELECT count(*) AS count_1
FROM task_instance
WHERE task_instance.pool = %(pool_1)s AND task_instance.state IN (%(state_1)s, %(state_2)s)]
[parameters: {'pool_1': 'default_pool', 'state_1': 'running', 'state_2': 'queued'}]
(Background on this error at: http://sqlalche.me/e/4xp6)[0m
[31mTraceback (most recent call last):
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 588, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.ProtocolViolation: invalid frontend message type 97
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 1495, in _validate_and_run_task_instances
self._process_and_execute_tasks(simple_dag_bag)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 588, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (psycopg2.errors.ProtocolViolation) invalid frontend message type 97
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
If disable statsd, everything resume. Is it a bug for airflow? any advise to resolve it?
I faced the same error, and after a few tests, i can get statsd metrics working. Typically, you will see the error if the following conditions are met.
Statsd enabled set to True
SqlAlchemy connection pool set to True
Scheduler syserr log enabled (by redirect the err log to a file where you can see this error)
In my case, even though the scheduler kept throwing the error logs, statsd metrics were still delivered, and tasks were also scheduled as they should. I dont know how to measure the impact, i also dont want to sacrifice sql_alchemy connection pool, so I leave statsd turned off.
(I guess other people not seeing the error because they are missing the 3rd one above)

Doctrine(Symfony4) is storing data as invalid HEX values

When I was about to deploy my Symfony4 app in ubuntu 18 php7.1-fpm + apache I execute some commands to load default data and some fixtures. The problem is that always receive SQLSTATE[22021]: Character not in repertoire: 7 ERROR: invalid byte sequence for encoding "UTF8": 0xcd 0x73 In the entities I noticed that are the fields which are mapped as array, json, or simple_array.
Here is an example of one of those fields value:
\x65\x6d\x70\x72\x65\x73\x61\x20\x64\x65\x20\x6
1\x73\x65\x67\x75\x72\x61\x6d\x69\x65\x6e\x74\x6f\x20\x6c\x6f\x67\xcd\x73\x74\x69\x63\x6f\x20\x61\x6c\x20\x74\x61\x62\x61\x63\x6f
That is the value for an array of string.
The database config is setted to UTF-8 also the php.ini configuration, the database server is created also using UTF-8.
How can I fix this? I've created the database several times but the same results remains.
Thanks in advance!!
UPDATE
When I repeat the process on Windows none of this happens...
UPDATE
Here the complete crash log
[2019-10-08 15:21:26] doctrine.DEBUG: INSERT INTO ext_log_entries (id, action, logged_at, object_id, object_class, version, data, username) VALUES (?, ?, ?, ?, ?, ?, ?, ?) {"1":2042,"2":"create","3":"2019-10-08 15:21:24","4":2042,"5":"App\\Entity\\SeaShipment","6":1,"7":{"manifest":"0323/2019","dmNumber":null,"arrivedAt":"2019-09-16 23:00:00","companyName":"MAQUIMPORT","agencyName":"MINAGRI","contractNumber":null,"merchandiseDescription":null,"countryName":null,"dmNumberAt":null,"etaAt":null,"funderName":null,"customerName":null,"empoweredName":null,"buyerName":null,"docsReceivedAt":null,"originalDocsReceivedAt":null,"billingDeliveredAt":null,"funderBilling":null,"deliveredCustomerAt":null,"isUpdatable":null,"createdFromIp":null,"lastUpdatedFromIp":null,"createdBy":null,"lastUpdatedBy":null,"createdAt":"2019-10-08 15:21:20","lastUpdatedAt":"2019-10-08 15:21:20","deletedAt":null,"seaShipmentType":null,"bl":"2019-M-001147","destinationDock":"TCM","isReleasedHouse":true,"isReleasedMaster":true,"isLocked":false,"isEnabled":true,"daysWithoutDm":0,"daysInTcm":3,"location":"B06","weight":8562,"yard":null,"cabotage":null,"transferedAt":"2019-09-16 14:25:00","transferedTo":"(binary value)","containerNumber":"MAGU5169507","containerType":"HC","containerDimention":40,"lastMarielReportAt":"2019-09-19 23:00:00","shippingCompanyName":"NIRINT","isActive":true,"shipName":null,"journey":null,"originDock":null,"blAt":null,"correspondentName":null,"forwarderName":null,"downloadUngroupAt":null,"beDeliveredAt":null,"packageQuantity":null,"shippingCompany":{"id":26}},"8":null} []
For other similar data or transactions before this one the problem is not happening
Can it be that your database doesn't accept cyrillyc/arabic etc alphabets ?
If yes that may help (if you use mysql):
Add to file etc/mysql/my.cnf:
[mysqld]
collation-server = utf8mb4_bin
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4
skip-character-set-client-handshake
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
After that :
sudo service mysql restart
then drop database and create it from scratch.

EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set

I'm trying to run a simple R code in SQL Server 2016:
EXEC sp_execute_external_script
#language =N'R',
#script=N'OutputDataSet<-InputDataSet',
#input_data_1 =N'SELECT 1 AS hello'
WITH RESULT SETS (([hello] int not null));
GO
I have followed this link to configure: https://tomaztsql.wordpress.com/2016/07/26/enabling-sp_execute_external_script-to-run-r-scripts-in-sql-server-2016/
I'm getting error:
Msg 39023, Level 16, State 1, Procedure sp_execute_external_script,
Line 1 [Batch Start Line 0]
'sp_execute_external_script' is disabled on this instance of SQL
Server. Use sp_configure 'external scripts enabled' to enable it.
Msg 11536, Level 16, State 1, Line 1
EXECUTE statement failed because its WITH RESULT SETS clause specified 1
result set(s), but the statement only sent 0 result set(s) at run time.
when I checked with :
EXECUTE sp_configure;
GO
The result shows like this:
name minimum maximum config_value run_value
external scripts enabled 0 1 1 0
Why the run value is still 0 (note-I have SQL Server Launchpad restarted)? What is the resolution for this?
Issue resolved. need to restart SQL Server services. Which will restart everything. And its working fine.

Resources