Apache Airflow authentication through IdentityServer4(.NET) - airflow

I installed Airflow on a VM and trying to integrate Airflow with Identity Gateway Server. The server is an implementation of IdentityServer4 (https://identityserver4.readthedocs.io/en/latest/).
I have created a client in the IdentityServer and trying to get help docs.
I did not find any references for this.
**** Updating after trying out Python Flask AppBuilder OAuth ****
I have followed the page https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-oauth
and configured related parameters here is my
OAUTH_PROVIDERS = [{
'name':'azure',
'token_key':'access_token',
'remote_app': {
'api_base_url':'<gateway>/',
'client_kwargs':{
'scope': 'profile openid email'
},
'access_token_url':'<gateway>/connect/token',
'authorize_url':'<gateway>/connect/authorize',
'request_token_url': None,
'client_id': 'AIRFLOW_DEMO_CLIENT',
'client_secret': None,
}
}]
I am not sure if the name azure is right or not.
The good news is airflow is asking me to login through the IdentityServer login page and the problem is I am getting errors after passing authentication.
[2021-09-18 05:42:28,548] {app.py:1892} ERROR - Exception on /oauth-authorized/azure [GET]
Traceback (most recent call last):
File "/home/azureuser/af_env/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/home/azureuser/af_env/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/azureuser/af_env/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/azureuser/af_env/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/home/azureuser/af_env/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/home/azureuser/af_env/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/azureuser/af_env/lib/python3.6/site-packages/flask_appbuilder/security/views.py", line 655, in oauth_authorized
resp = self.appbuilder.sm.oauth_remotes[provider].authorize_access_token()
File "/home/azureuser/af_env/lib/python3.6/site-packages/authlib/integrations/flask_client/remote_app.py", line 74, in authorize_access_token
params = self.retrieve_access_token_params(flask_req, request_token)
File "/home/azureuser/af_env/lib/python3.6/site-packages/authlib/integrations/base_client/base_app.py", line 145, in retrieve_access_token_params
params = self._retrieve_oauth2_access_token_params(request, params)
File "/home/azureuser/af_env/lib/python3.6/site-packages/authlib/integrations/base_client/base_app.py", line 126, in _retrieve_oauth2_access_token_params
raise MismatchingStateError()
authlib.integrations.base_client.errors.MismatchingStateError: mismatching_state: CSRF Warning! State not equal in request and response.
Same error I am seeing on the web page (after redirected to a link)
Pasting the URL:
https://<my_ip>/oauth-authorized/azure?code=721656196&scope=profile%20openid%20email&state=eyJ0eXAiOiJKV1QinPMa58&session_state=wrO1****dYhY.29**A01
I am trying to map user info from IdentityServer to Airflow.

Airflow uses Flask Application Builder for authentication. You should check with Flask Application Builder how to integrate identity4. I guess the gateway supports Oauth2 or OpenID one of those is likely your best choice.
https://flask-appbuilder.readthedocs.io/en/latest/security.html

Related

Facing (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") in cloud composer

I am facing this issue:
(2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
on cloud composer on composer-1.16.5-airflow-1.10.14 version, it is an intermittent issue. We have tried cleaning our airflow metadata and modified the code (for example, replacing variable.get() with the jinja template) to reduce the load on db, but we are facing this issue on a daily level. We also restarted the scheduler but the issue started occuring again after two days, also the cpu usage and memory usage graph of airflow database on composer monitoring is constant but the sql database is going into unhealthy state in some time.
The whole error message is as :
Traceback (most recent call last): File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect return fn() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 364, in connect return _ConnectionFairy._checkout(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 495, in checkout rec = pool._do_get() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 241, in _do_get return self._create_connection() File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection return _ConnectionRecord(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 440, in __init__ self.__connect(first_connect_check=True) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 661, in __connect pool.logger.debug("Error on connect(): %s", e) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__ with_traceback=exc_tb, File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_ raise exception File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 656, in __connect connection = pool._invoke_creator(self) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect return dialect.connect(*cargs, **cparams) File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 493, in connect return self.dbapi.connect(*cargs, **cparams) File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/__init__.py", line 85, in Connect return Connection(*args, **kwargs) File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/connections.py", line 208, in __init__ super(Connection, self).__init__(*args, **kwargs2)_mysql_exceptions.OperationalError: (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
There could be multiple reasons as the error itself is too general, so it makes a lot of different possibilities for what could go wrong. Known causes:
Connections are blocked by firewall rules.
This can also temporarily happen while an instance is being restarted.
Generic GKE failures because nodes with airflow-sqlproxy are overloaded.
Since it's an intermittent issue, we can assure connections are not being blocked by firewall rules. We might have to check whether any instances have been restarted. And lastly to avoid generic GKE failures you can upgrade your machine types, allocating more resources.
Also as I already mentioned in the comments you're using an old version of Composer which is out of support from May,2022. Its always better to upgrade your composer to a certain version which will have support from Google .

"Bad Request-Error" when trying to connect to Azure Data Lake with Airflow

I try to connect to Azure Data Lake using Airflow. I use Airflow connection via the Web UI.
When I try to connect using the test button, I get an error Bad Request. As seen below
I use the correct UUIDs. These UUIDs have been verified in other cases. I also checked the firewall.
When I execute the DAG, I use the Azure Data Lake connection id to check if a file exists: If I apply the method as described here: What is the best way to check if a file exists on an Azure Datalake using Apache Airflow?
This is the error I get
[2022-05-06, 17:27:33 UTC] {log.py:127} ERROR - 99ec1d77-e91c-4fd3-a1c7-fa751ca1e779 - OAuth2Client:The token response from the server is unparseable as JSON: ***
Traceback (most recent call last):
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 168, in _validate_token_response
wire_response = json.loads(body)
File "/usr/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 3 column 1 (char 4)
[2022-05-06, 17:27:33 UTC] {log.py:127} ERROR - 99ec1d77-e91c-4fd3-a1c7-fa751ca1e779 - OAuth2Client:Error validating get token response: ***
Traceback (most recent call last):
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 238, in _handle_get_token_response
return self._validate_token_response(body)
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 168, in _validate_token_response
Authenticating to Azure Data Lake is by token credentials i.e. add specific credentials (client_id, secret, tenant) and account name to the Airflow connection.
Information about how to set it up can be found in this doc.
You can see code example in the source code test function.
Other method of authentication are currently not supported.
I was trying to get the connection running using the Airflow implementation. My impression was that it was buggy and did not work out well. The above situation happened with Airflow 2.2.5. When I upgraded to Airflow 2.3.0, the test button was grayed out.
The final solution was to use Access Tokens instead.

AZURE Cognitive Serivces -> KeyError: 'Endpoint'

I am using the SDK (Python) for Computer Vision published in Mocrosoft Docs (https://learn.microsoft.com/es-es/azure/cognitive-services/computer-vision/quickstarts-sdk/python-sdk).
When I run the code, this error occurs:
Traceback (most recent call last):
File "c:/analyze_image_local.py", line 68, in <module>
description_result = computervision_client.describe_image_in_stream(local_image)
File "C:\Anaconda3\lib\site-packages\azure\cognitiveservices\vision\computervision\operations\_computer_vision_client_operations.py", line 1202, in describe_image_in_stream
request = self._client.post(url, query_parameters, header_parameters, body_content)
File "C:\Anaconda3\lib\site-packages\msrest\service_client.py", line 193, in post
request = self._request('POST', url, params, headers, content, form_content)
File "C:\Anaconda3\lib\site-packages\msrest\service_client.py", line 108, in _request
request = ClientRequest(method, self.format_url(url))
File "C:\Anaconda3\lib\site-packages\msrest\service_client.py", line 155, in format_url
base = self.config.base_url.format(**kwargs).rstrip('/')
KeyError: 'Endpoint'
I rather used the REST API method (https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/python-disk)
However, it can be useful to complete the endpoint using this command line:
analyze_url = endpoint + "vision/v2.1/analyze"
Simply run the following command to reinstall customvision SDK worked for me.
pip uninstall azure-cognitiveservices-vision-customvision
pip install azure-cognitiveservices-vision-customvision

How to Use Firebase-Admin SDK with Credentials of ServiceAccount in Different Project?

I have two Google Cloud Platform projects - let's call them proj-a and proj-b. I have a GCP ServiceAccount created in proj-a that tries to access user objects that are managed by Firebase Authentication running on top of proj-b.
The ServiceAccount has been assigned the Firebase Authentication Admin Google Cloud IAM role on the firebase project.
The output of the following code snippet looks promising:
import firebase_admin
from firebase_admin import auth
app = firebase_admin.initialize_app(options={"projectId": "proj-b"})
print(f"app: {app.project_id}")
print(f"creds: {app.credential.project_id}")
app: proj-b
creds: proj-a
But when I now call auth.get_user("some-id") I get the error message: Identity Toolkit API has not been used in project {PROJECT_NUM_OF_PROD_A} before or it is disabled. Of course, the identity toolkit has not been enabled on proj-a since Firebase is running on proj-b. How to get this running? The ServiceAccount is located in proj-a because most other components of the backend are located there. Defining the ServiceAccount in proj-b is therefore not an acceptable solution for me.
Full (cleaned) stack trace below:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/firebase_admin/_user_mgt.py", line 397, in get_user
response = self._client.request('post', 'getAccountInfo', json=payload)
File "/usr/local/lib/python3.7/site-packages/firebase_admin/auth.py", line 514, in request
resp.raise_for_status()
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.googleapis.com/identitytoolkit/v3/relyingparty/getAccountInfo
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/firebase_admin/auth.py", line 230, in get_user
response = user_manager.get_user(uid=uid)
File "/usr/local/lib/python3.7/site-packages/firebase_admin/_user_mgt.py", line 400, in get_user
self._handle_http_error(INTERNAL_ERROR, msg, error)
File "/usr/local/lib/python3.7/site-packages/firebase_admin/_user_mgt.py", line 545, in _handle_http_error
raise ApiCallError(code, msg, error)
firebase_admin._user_mgt.ApiCallError: Failed to get user by user ID: some-id.
Server response: {
"error": {
"code": 403,
"message": "Identity Toolkit API has not been used in project {PROJECT_NUM_OF_PROD_A} before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/identitytoolkit.googleapis.com/overview?project={PROJECT_NUM_OF_PROD_A} then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
"errors": [
{
"message": "Identity Toolkit API has not been used in project {PROJECT_NUM_OF_PROD_A} before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/identitytoolkit.googleapis.com/overview?project={PROJECT_NUM_OF_PROD_A} then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
"domain": "usageLimits",
"reason": "accessNotConfigured",
"extendedHelp": "https://console.developers.google.com"
}
],
"status": "PERMISSION_DENIED"
}
}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/firebase_admin/auth.py", line 233, in get_user
raise AuthError(error.code, str(error), error.detail)
firebase_admin.auth.AuthError: Failed to get user by user ID: some-id.
Server response: {
"error": {
"code": 403,
"message": "Identity Toolkit API has not been used in project {PROJECT_NUM_OF_PROD_A} before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/identitytoolkit.googleapis.com/overview?project=543111740960 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
"errors": [
{
"message": "Identity Toolkit API has not been used in project {PROJECT_NUM_OF_PROD_A} before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/identitytoolkit.googleapis.com/overview?project=543111740960 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
"domain": "usageLimits",
"reason": "accessNotConfigured",
"extendedHelp": "https://console.developers.google.com"
}
],
"status": "PERMISSION_DENIED"
}
}
Update after updating firebase-admin client library
As mentioned by #Hiranya Jayathilaka I was not running the latest version of the firebase admin SDK. After updating from version 2.14.0 to 3.2.1 the app appears to connect to the correct project but I still get a permission denied error. I checked the permissions of the used ServiceAccount on proj_b and even gave it roles/firebase.admin as well as roles/editor just to make sure I do not lack any necessary permissions.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/firebase_admin/_user_mgt.py", line 479, in get_user
'post', '/accounts:lookup', json=payload)
File "/usr/local/lib/python3.7/site-packages/firebase_admin/_http_client.py", line 113, in body_and_response
resp = self.request(method, url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/firebase_admin/_http_client.py", line 105, in request
resp.raise_for_status()
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://identitytoolkit.googleapis.com/v1/projects/{PROJECT_ID_OF_PROD_B}/accounts:lookup
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/firebase_admin/auth.py", line 268, in get_user
response = user_manager.get_user(uid=uid)
File "/usr/local/lib/python3.7/site-packages/firebase_admin/_user_mgt.py", line 481, in get_user
raise _auth_utils.handle_auth_backend_error(error)
firebase_admin.exceptions.PermissionDeniedError: Error while calling Auth service (Identity Toolkit API has not been used in project {PROJECT_NUM_OF_PROD_A} before or it is disabled. Enable it by visiting https). //console.developers.google.com/apis/api/identitytoolkit.googleapis.com/overview?project={PROJECT_NUM_OF_PROD_A} then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
You seem to be using an old version of the Python SDK. Old versions used to identify the target project for Auth API calls from the service account. This is evident from the legacy endpoint that it's trying to reach: https://www.googleapis.com/identitytoolkit/v3/relyingparty/getAccountInfo. Since your service account is from proj-a, that's what it's targeting.
If you use v2.16.0 or higher, the SDK will connect to the new project-specific endpoint. Specifically, you need this change to be included in your SDK: https://github.com/firebase/firebase-admin-python/pull/256

AirFlow SFTP upload using public key file

I am trying to upload a file into a SFTP using a key file. I already configured the connection and I can authenticate without any problem:
{'key_file': '/my_folder/public_key'}
Also I am able to do all the process manually using Cyberduck for example. This is the function that I am calling:
from contextlib import closing
from airflow.contrib.hooks.ssh_hook import SSHHook
# Get connection details
ssh = SSHHook(ssh_conn_id='my conn id')
# Upload the file into sftp
with closing(ssh.get_conn().open_sftp()) as sftp_client:
sftp_client.put('/local_folder/my_file.xlsx', '/sftp_folder/my_file.xlsx')
This is the error I am receiving:
{base_hook.py:80} INFO - Using connection to: xxxxxxx
{transport.py:1687} INFO - Connected (version 2.0, client AWS_SFTP_1.0)
{transport.py:1687} INFO - Authentication (publickey) successful!
PermissionError: [Errno 13] Forbidden
Does anyone have any idea of why this is happening if I am able to do the same manually?
Thank you so much!
The whole stack:
{transport.py:1687} INFO - Authentication (publickey) successful!
{sftp.py:131} INFO - [chan 0] Opened sftp connection (server version 3)
Traceback (most recent call last):
File "/.../airflow/plugins/operators/my_operator.py", line 231, in sftp_upload
client.put(local_path, sftp_path)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 727, in put
return self.putfo(fl, remotepath, file_size, callback, confirm)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 683, in putfo
with self.file(remotepath, 'wb') as fr:
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 341, in open
t, msg = self._request(CMD_OPEN, filename, imode, attrblock)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 780, in _request
return self._read_response(num)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 832, in _read_response
self._convert_status(msg)
File "/.../venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 863, in _convert_status
raise IOError(errno.EACCES, text)
PermissionError: [Errno 13] Forbidden
The problem I was facing was due to invalid path in sftp folder. Cyberduck was hiding part of the path so I was including an incomplete one into my code. Paramiko was returning Forbidden because probably the path exists but this account doesn't have access to it.
Once I included the full path the code above worked pretty fine!
Thanks!

Resources