"Bad Request-Error" when trying to connect to Azure Data Lake with Airflow - airflow

I try to connect to Azure Data Lake using Airflow. I use Airflow connection via the Web UI.
When I try to connect using the test button, I get an error Bad Request. As seen below
I use the correct UUIDs. These UUIDs have been verified in other cases. I also checked the firewall.
When I execute the DAG, I use the Azure Data Lake connection id to check if a file exists: If I apply the method as described here: What is the best way to check if a file exists on an Azure Datalake using Apache Airflow?
This is the error I get
[2022-05-06, 17:27:33 UTC] {log.py:127} ERROR - 99ec1d77-e91c-4fd3-a1c7-fa751ca1e779 - OAuth2Client:The token response from the server is unparseable as JSON: ***
Traceback (most recent call last):
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 168, in _validate_token_response
wire_response = json.loads(body)
File "/usr/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 3 column 1 (char 4)
[2022-05-06, 17:27:33 UTC] {log.py:127} ERROR - 99ec1d77-e91c-4fd3-a1c7-fa751ca1e779 - OAuth2Client:Error validating get token response: ***
Traceback (most recent call last):
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 238, in _handle_get_token_response
return self._validate_token_response(body)
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 168, in _validate_token_response

Authenticating to Azure Data Lake is by token credentials i.e. add specific credentials (client_id, secret, tenant) and account name to the Airflow connection.
Information about how to set it up can be found in this doc.
You can see code example in the source code test function.
Other method of authentication are currently not supported.

I was trying to get the connection running using the Airflow implementation. My impression was that it was buggy and did not work out well. The above situation happened with Airflow 2.2.5. When I upgraded to Airflow 2.3.0, the test button was grayed out.
The final solution was to use Access Tokens instead.

Related

Facing (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") in cloud composer

I am facing this issue:
(2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
on cloud composer on composer-1.16.5-airflow-1.10.14 version, it is an intermittent issue. We have tried cleaning our airflow metadata and modified the code (for example, replacing variable.get() with the jinja template) to reduce the load on db, but we are facing this issue on a daily level. We also restarted the scheduler but the issue started occuring again after two days, also the cpu usage and memory usage graph of airflow database on composer monitoring is constant but the sql database is going into unhealthy state in some time.
The whole error message is as :
Traceback (most recent call last): File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect return fn() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 364, in connect return _ConnectionFairy._checkout(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 495, in checkout rec = pool._do_get() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 241, in _do_get return self._create_connection() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection return _ConnectionRecord(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 440, in __init__ self.__connect(first_connect_check=True) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 661, in __connect pool.logger.debug("Error on connect(): %s", e) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__ with_traceback=exc_tb, File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_ raise exception File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 656, in __connect connection = pool._invoke_creator(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect return dialect.connect(*cargs, **cparams) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 493, in connect return self.dbapi.connect(*cargs, **cparams) File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/__init__.py", line 85, in Connect return Connection(*args, **kwargs) File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/connections.py", line 208, in __init__ super(Connection, self).__init__(*args, **kwargs2)_mysql_exceptions.OperationalError: (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
There could be multiple reasons as the error itself is too general, so it makes a lot of different possibilities for what could go wrong. Known causes:
Connections are blocked by firewall rules.
This can also temporarily happen while an instance is being restarted.
Generic GKE failures because nodes with airflow-sqlproxy are overloaded.
Since it's an intermittent issue, we can assure connections are not being blocked by firewall rules. We might have to check whether any instances have been restarted. And lastly to avoid generic GKE failures you can upgrade your machine types, allocating more resources.
Also as I already mentioned in the comments you're using an old version of Composer which is out of support from May,2022. Its always better to upgrade your composer to a certain version which will have support from Google .

Error running source run-gateway for Google IOT

I have been trying to find help for this problem and there hasn't been too much success. I keep getting this error. I was following this guide: https://cloud.google.com/community/tutorials/cloud-iot-gateways-rpi and I haven't been able to get past step 14.
source run-gateway
Creating JWT using RS256 from private key file rsa_private.pem
on_publish, userdata None, mid 1
Unable to find key 1
connect status False
on_connect Connection Refused: not authorised.
on_disconnect 5: The connection was refused.
connect status False
connect status False
connect status False
^CTraceback (most recent call last):
File "./cloudiot_mqtt_gateway.py", line 356, in <module>
main()
File "./cloudiot_mqtt_gateway.py", line 284, in main
time.sleep(1)
I was able to successfully get the gateway running but had to manually modify the script. Make sure that you have updated the run-gateway shell script to point to your registry ID and project ID.
If any of the parameters are incorrect (e.g. device, project, region) then your device will be disconnected from the device bridge.

GraqlSemanticException-label 'database' not found. Please check server logs for the stack trace

So I was trying to get Biograkn usecase for BLAST working , following the steps as per this video series. I was able to load the schema , but after that we need to execute python migrate.py to load data into it. I executed the command and i am getting the following error traceback
Traceback (most recent call last):
File "/home/aditya/anaconda3/envs/RD/lib/python3.7/site-packages/grakn/service/Session/TransactionService.py", line 161, in send
response = next(self._response_iterator)
File "/home/aditya/anaconda3/envs/RD/lib/python3.7/site-packages/grpc/_channel.py", line 364, in __next__
return self._next()
File "/home/aditya/anaconda3/envs/RD/lib/python3.7/site-packages/grpc/_channel.py", line 358, in _next
raise self
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "GraqlSemanticException-label 'database' not found. Please check server logs for the stack trace."
debug_error_string = "{"created":"#1582269484.666990683","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"GraqlSemanticException-label 'database' not found. Please check server logs for the stack trace.","grpc_status":3}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "migrate.py", line 92, in <module>
init(data_path="uniprot-asthma-proteins.fasta")
File "migrate.py", line 23, in init
session, q_get_database, q_insert_database, "$db"
File "/home/aditya/Projects/RD/biograkn/blast/util.py", line 14, in insert_if_non_existent
found_list = list(read_transaction.query(get_query))
File "/home/aditya/anaconda3/envs/RD/lib/python3.7/site-packages/grakn/client.py", line 131, in query
return self._tx_service.query(query, infer)
File "/home/aditya/anaconda3/envs/RD/lib/python3.7/site-packages/grakn/service/Session/TransactionService.py", line 49, in query
response = self._communicator.send(request)
File "/home/aditya/anaconda3/envs/RD/lib/python3.7/site-packages/grakn/service/Session/TransactionService.py", line 165, in send
raise GraknError("Server/network error: {0}\n\n generated from request: {1}".format(e, request))
grakn.exception.GraknError.GraknError: Server/network error: <_Rendezvous of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "GraqlSemanticException-label 'database' not found. Please check server logs for the stack trace."
debug_error_string = "{"created":"#1582269484.666990683","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"GraqlSemanticException-label 'database' not found. Please check server logs for the stack trace.","grpc_status":3}"
>
generated from request: query_req {
query: "match $db isa database, has name \"uniprot\"; get $db;"
}
now this is not understandable why loading is not possible. Hence requesting for assistance.
There appears to already be a problem with the schema, since graql is trying to look for the label "database" but cannot find it.
You can check your schema using workbase:
1. connect to your local server (localhost:48555)
2. select your keyspace ("proteins")
3. click on the hierarchy symbol on the top left next to the grakn logo
If the schema is present, it will be visualized automatically.

AirFlow SFTP upload using public key file

I am trying to upload a file into a SFTP using a key file. I already configured the connection and I can authenticate without any problem:
{'key_file': '/my_folder/public_key'}
Also I am able to do all the process manually using Cyberduck for example. This is the function that I am calling:
from contextlib import closing
from airflow.contrib.hooks.ssh_hook import SSHHook
# Get connection details
ssh = SSHHook(ssh_conn_id='my conn id')
# Upload the file into sftp
with closing(ssh.get_conn().open_sftp()) as sftp_client:
sftp_client.put('/local_folder/my_file.xlsx', '/sftp_folder/my_file.xlsx')
This is the error I am receiving:
{base_hook.py:80} INFO - Using connection to: xxxxxxx
{transport.py:1687} INFO - Connected (version 2.0, client AWS_SFTP_1.0)
{transport.py:1687} INFO - Authentication (publickey) successful!
PermissionError: [Errno 13] Forbidden
Does anyone have any idea of why this is happening if I am able to do the same manually?
Thank you so much!
The whole stack:
{transport.py:1687} INFO - Authentication (publickey) successful!
{sftp.py:131} INFO - [chan 0] Opened sftp connection (server version 3)
Traceback (most recent call last):
File "/.../airflow/plugins/operators/my_operator.py", line 231, in sftp_upload
client.put(local_path, sftp_path)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 727, in put
return self.putfo(fl, remotepath, file_size, callback, confirm)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 683, in putfo
with self.file(remotepath, 'wb') as fr:
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 341, in open
t, msg = self._request(CMD_OPEN, filename, imode, attrblock)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 780, in _request
return self._read_response(num)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 832, in _read_response
self._convert_status(msg)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 863, in _convert_status
raise IOError(errno.EACCES, text)
PermissionError: [Errno 13] Forbidden
The problem I was facing was due to invalid path in sftp folder. Cyberduck was hiding part of the path so I was including an incomplete one into my code. Paramiko was returning Forbidden because probably the path exists but this account doesn't have access to it.
Once I included the full path the code above worked pretty fine!
Thanks!

error creating container in openstack swift

I am trying to install the latest version of swift following instructions from http://docs.openstack.org/icehouse/install.../general-installation-steps-swift.html. I am able to authenticate with keystone and also able to successfully run the command swift stat. But, when I run the command swift upload myfiles temp, I get the following error
Error trying to create container 'myfiles': 404 Not Found: {"error": {"message": "The
resource could not be found.", "c
Object PUT failed: 9.109.124.109:5000:5000/v2.0/myfiles/temp 400 Bad Request
[first 60 chars of response] {"error": {"message": "Expecting to find application/json
in
In /var/log/syslog, I find the following information:
May 28 18:11:40 datafed3 account-server: ERROR __call__ error with PUT /sdb1/100869
/AUTH_system/myfiles : #012Traceback (most recent call last):#012 File "/usr/lib
/python2.7/dist-packages/swift/account/server.py", line 284, in __call__#012 res =
method(req)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line
2217, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/dist-
packages/swift/common/utils.py", line 837, in _timing_stats#012 resp = func(ctrl,
*args, **kwargs)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py",
line 128, in PUT#012 req.headers['x-bytes-used'])#012 File "/usr/lib/python2.7/dist-
packages/swift/account/backend.py", line 210, in put_container#012 raise
DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError:
DB connection error (/srv/node/sdb1/accounts/100869/80d/62816079be0fc97a4557f52b3b12380d
/62816079be0fc97a4557f52b3b12380d.db, 0):#012DB doesn't exist
One situation may cause this problem is: when create tenant, one or more storage node is down. then when you upload an object, proxy get 404 from at least one storage node.
On my test, even the storage node are all up after tenant creation, 404 error still exist. So, make sure all storage nodes are up, and create another tenant to test.

Resources