Facing (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") in cloud composer - airflow

I am facing this issue:
(2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
on cloud composer on composer-1.16.5-airflow-1.10.14 version, it is an intermittent issue. We have tried cleaning our airflow metadata and modified the code (for example, replacing variable.get() with the jinja template) to reduce the load on db, but we are facing this issue on a daily level. We also restarted the scheduler but the issue started occuring again after two days, also the cpu usage and memory usage graph of airflow database on composer monitoring is constant but the sql database is going into unhealthy state in some time.
The whole error message is as :
Traceback (most recent call last): File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect return fn() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 364, in connect return _ConnectionFairy._checkout(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 495, in checkout rec = pool._do_get() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 241, in _do_get return self._create_connection() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection return _ConnectionRecord(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 440, in __init__ self.__connect(first_connect_check=True) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 661, in __connect pool.logger.debug("Error on connect(): %s", e) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__ with_traceback=exc_tb, File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_ raise exception File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 656, in __connect connection = pool._invoke_creator(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect return dialect.connect(*cargs, **cparams) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 493, in connect return self.dbapi.connect(*cargs, **cparams) File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/__init__.py", line 85, in Connect return Connection(*args, **kwargs) File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/connections.py", line 208, in __init__ super(Connection, self).__init__(*args, **kwargs2)_mysql_exceptions.OperationalError: (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")

There could be multiple reasons as the error itself is too general, so it makes a lot of different possibilities for what could go wrong. Known causes:
Connections are blocked by firewall rules.
This can also temporarily happen while an instance is being restarted.
Generic GKE failures because nodes with airflow-sqlproxy are overloaded.
Since it's an intermittent issue, we can assure connections are not being blocked by firewall rules. We might have to check whether any instances have been restarted. And lastly to avoid generic GKE failures you can upgrade your machine types, allocating more resources.
Also as I already mentioned in the comments you're using an old version of Composer which is out of support from May,2022. Its always better to upgrade your composer to a certain version which will have support from Google .

Related

Custom timetable not registered by airflow webserver in Cloud Composer 1

I've recently created custom timetable. Worked perfectly locally (python==3.9.12, airflow==2.3.0), so decided to upload it to plugins folder in my Cloud Composer (version==1.18.11, airflow==2.2.5). While scheduler picks up timetable and dag is run based on it, trying to open dag in UI throws me this error window:
Something bad has happened.
Airflow is used by many users, and it is very likely that others had similar problems and you can easily find
a solution to your problem.
Consider following these steps:
* gather the relevant information (detailed logs with errors, reproduction steps, details of your deployment)
* find similar issues using:
* GitHub Discussions
* GitHub Issues
* Stack Overflow
* the usual search engine you use on a daily basis
* if you run Airflow on a Managed Service, consider opening an issue using the service support channels
* if you tried and have difficulty with diagnosing and fixing the problem yourself, consider creating a bug report.
Make sure however, to include all relevant details and results of your investigation so far.
Python version: 3.8.12
Airflow version: 2.2.5+composer
Node: 67b211ed8faa
-------------------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/python3.8/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/auth.py", line 51, in decorated
return func(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/decorators.py", line 108, in view_func
return f(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/decorators.py", line 71, in wrapper
return f(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/views.py", line 2328, in tree
dag = current_app.dag_bag.get_dag(dag_id)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py", line 186, in get_dag
self._add_dag_from_db(dag_id=dag_id, session=session)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py", line 261, in _add_dag_from_db
dag = row.dag
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/serialized_dag.py", line 180, in dag
dag = SerializedDAG.from_dict(self.data) # type: Any
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 951, in from_dict
return cls.deserialize_dag(serialized_obj['dag'])
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 877, in deserialize_dag
v = _decode_timetable(v)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 167, in _decode_timetable
raise _TimetableNotRegistered(importable_string)
airflow.serialization.serialized_objects._TimetableNotRegistered: Timetable class '<enter_your_timetable_plugin_name>.<enter_your_timetable_class_name>' is not registered
Going to window Plugins shows that no plugins are added (both Cloud Composer==2.0.15, airflow==2.2.5) and my local setup uploads plugin properly.
What's really interesting that while having same airflow version, both versions of Cloud Composer works differently.
I don't override any of default airflow variables, nor that should impact anything that's described here.
Many many thanks for any suggestions.

"Bad Request-Error" when trying to connect to Azure Data Lake with Airflow

I try to connect to Azure Data Lake using Airflow. I use Airflow connection via the Web UI.
When I try to connect using the test button, I get an error Bad Request. As seen below
I use the correct UUIDs. These UUIDs have been verified in other cases. I also checked the firewall.
When I execute the DAG, I use the Azure Data Lake connection id to check if a file exists: If I apply the method as described here: What is the best way to check if a file exists on an Azure Datalake using Apache Airflow?
This is the error I get
[2022-05-06, 17:27:33 UTC] {log.py:127} ERROR - 99ec1d77-e91c-4fd3-a1c7-fa751ca1e779 - OAuth2Client:The token response from the server is unparseable as JSON: ***
Traceback (most recent call last):
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 168, in _validate_token_response
wire_response = json.loads(body)
File "/usr/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 3 column 1 (char 4)
[2022-05-06, 17:27:33 UTC] {log.py:127} ERROR - 99ec1d77-e91c-4fd3-a1c7-fa751ca1e779 - OAuth2Client:Error validating get token response: ***
Traceback (most recent call last):
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 238, in _handle_get_token_response
return self._validate_token_response(body)
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 168, in _validate_token_response
Authenticating to Azure Data Lake is by token credentials i.e. add specific credentials (client_id, secret, tenant) and account name to the Airflow connection.
Information about how to set it up can be found in this doc.
You can see code example in the source code test function.
Other method of authentication are currently not supported.
I was trying to get the connection running using the Airflow implementation. My impression was that it was buggy and did not work out well. The above situation happened with Airflow 2.2.5. When I upgraded to Airflow 2.3.0, the test button was grayed out.
The final solution was to use Access Tokens instead.

AirFlow SFTP upload using public key file

I am trying to upload a file into a SFTP using a key file. I already configured the connection and I can authenticate without any problem:
{'key_file': '/my_folder/public_key'}
Also I am able to do all the process manually using Cyberduck for example. This is the function that I am calling:
from contextlib import closing
from airflow.contrib.hooks.ssh_hook import SSHHook
# Get connection details
ssh = SSHHook(ssh_conn_id='my conn id')
# Upload the file into sftp
with closing(ssh.get_conn().open_sftp()) as sftp_client:
sftp_client.put('/local_folder/my_file.xlsx', '/sftp_folder/my_file.xlsx')
This is the error I am receiving:
{base_hook.py:80} INFO - Using connection to: xxxxxxx
{transport.py:1687} INFO - Connected (version 2.0, client AWS_SFTP_1.0)
{transport.py:1687} INFO - Authentication (publickey) successful!
PermissionError: [Errno 13] Forbidden
Does anyone have any idea of why this is happening if I am able to do the same manually?
Thank you so much!
The whole stack:
{transport.py:1687} INFO - Authentication (publickey) successful!
{sftp.py:131} INFO - [chan 0] Opened sftp connection (server version 3)
Traceback (most recent call last):
File "/.../airflow/plugins/operators/my_operator.py", line 231, in sftp_upload
client.put(local_path, sftp_path)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 727, in put
return self.putfo(fl, remotepath, file_size, callback, confirm)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 683, in putfo
with self.file(remotepath, 'wb') as fr:
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 341, in open
t, msg = self._request(CMD_OPEN, filename, imode, attrblock)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 780, in _request
return self._read_response(num)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 832, in _read_response
self._convert_status(msg)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 863, in _convert_status
raise IOError(errno.EACCES, text)
PermissionError: [Errno 13] Forbidden
The problem I was facing was due to invalid path in sftp folder. Cyberduck was hiding part of the path so I was including an incomplete one into my code. Paramiko was returning Forbidden because probably the path exists but this account doesn't have access to it.
Once I included the full path the code above worked pretty fine!
Thanks!

AttributeError: module 'select' has no attribute 'poll'

I'm running eventlet.monkey_patch() while trying to spin up a flask server which uses flask-socketio. This is the traceback:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/alhasan/MeetupPoint/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 777, in inner
srv.serve_forever()
File "/home/alhasan/MeetupPoint/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 612, in serve_forever
HTTPServer.serve_forever(self)
File "/usr/lib64/python3.6/socketserver.py", line 232, in serve_forever
with _ServerSelector() as selector:
File "/usr/lib64/python3.6/selectors.py", line 348, in __init__
self._poll = select.poll()
AttributeError: module 'select' has no attribute 'poll'
I tried using monkey_patch, as previously I encountered the following error:
RuntimeError: You need to use the eventlet server. See the Deployment section of the documentation for more information.
I have eventlet installed.
...
eventlet==0.23.0
Flask==0.12.2
Flask-Migrate==2.1.1
Flask-Script==2.0.6
Flask-SocketIO==3.0.1
...
Is there a fix for this?
My initial problem was that my server returns bad requests everytime I try to emit a message from the client. But, the other way works. Would really appreciate any kind of a solution. :)

Cloudify nodecellar,Task failed 'script_runner.tasks.run' -> RecoverableError('ProcessException: ',)

when I try to install nodecellar with Cloudify,I am getting the following error
2015-07-13T17:31:03 LOG <nodecellar> [mongod_a50aa.configure] ERROR: Exception raised on operation [script_runner.tasks.run] invocation
Traceback (most recent call last):
File "/root/cloudify.host_dba5c/env/local/lib/python2.7/site-packages/cloudify/decorators.py", line 125, in wrapper
result = func(*args, **kwargs)
File "/root/cloudify.host_dba5c/env/local/lib/python2.7/site-packages/script_runner/tasks.py", line 58, in run
return process_execution(script_func, script_path, ctx, process)
File "/root/cloudify.host_dba5c/env/local/lib/python2.7/site-packages/script_runner/tasks.py", line 74, in process_execution
script_func(script_path, ctx, process)
File "/root/cloudify.host_dba5c/env/local/lib/python2.7/site-packages/script_runner/tasks.py", line 143, in execute
stderr_consumer.buffer.getvalue())
How can I fix this problem?
This exception is raised by the Cloudify Script Plugin you ran a script, which exited with a non-zero error code. Here is the source of that error.
The script that returned non-zero code is that script which is mapped to the configure operation on the mongod node. Which script that is depends on the version of the Nodecellar blueprint that you are using.
I can't give a more detailed answer without information regarding the specific blueprint version, which Cloudify version you have installed, details about your provider (local, Vagrant, Openstack, AWS), and OS (Ubuntu, Centos, etc).

Resources