I am writing a dag to call two functions inside my workflow in Google Composer. I have created a class for it using the SimpleHTTPOperator:
class cfSFTP2GCSOp(SimpleHttpOperator):
def execute(self, context):
http = HttpHook(self.method, http_conn_id=self.http_conn_id)
self.log.info("Calling HTTP method")
target_audience = 'https://dw-etl-transactor-unzip-files-nowvpwp6oq-uc.a.run.app'
request = google.auth.transport.requests.Request()
idt = id_token.fetch_id_token(request, target_audience)
self.headers = {'Authorization': "Bearer " + idt}
response = http.run(self.endpoint,
self.data,
self.headers,
self.extra_options)
self.log.info(response)
if response == "<Response [200]>":
return True
else:
return False
which I use inside the task to call the function:
gcp_cf_dw_etl_transactor_unzip_files = cfSFTP2GCSOp(
task_id='gcp_cf_dw_etl_unzip_files',
method='POST',
http_conn_id='gcp_cf_dw_etl_unzip_files',
data={},
endpoint='/',
headers={},
response_check=lambda response: True if response == "<Response [200]>" is True else False,
dag=dag,
)
I have to use two tasks to call each of the Cloud Functions I need, but what happens if I need to call multiple Cloud Functions? Is it possible to call all of them inside the same task or I need to continue doing what I am currently doing?
Thanks in advance!
Related
TLDR
In the python callable for a simpleHttpOperator response function, I am trying to push an xcom that has combined information from two sources to a specificied key (a hash of the filename/path and an object lookup from a DB)
Longer Tale
I have a filesensor written which grabs all new files and passes them to MultiDagRun to parallel process the information (scientific) in the files as xcom. Works great. The simpleHttpOperator POSTs filepath info to a submission api and receives back a task_id which it must then read as a response from another (slow running) api to get the result. This I all have working fine. Files get scanned, it launches multiple dags to process, and returns objects.
But... I cannot puzzle out how to push the result to an xcom inside the python response function for the simpleHttpOperator.
My google- and SO and Reddit-fu has failed me here (and it seems overkill to use the pythonOperator tho that's my next stop.). I notice a lot of people asking similar questions though.
How do you use context or ti or task_instance or context['task_instance'] with the response function? (I cannot use "Returned Value" xcom as I need to distinguish the xcom keys as parallel processing afaik). As the default I have context set to true in the default_args.
Sure I am missing something simple here, but stumped as to what it is (note, I did try the **kwargs and ti = kwargs['ti'] below as well before hitting SO...
def _handler_object_result(response, file):
# Note: related to api I am calling not Airflow internal task ids
header_result = response.json()
task_id = header_result["task"]["id"]
api = "https://redacted.com/api/task/result/{task_id}".format(task_id=task_id)
resp = requests.get(api, verify=False).json()
data = json.loads(resp["data"])
file_object = json.dumps(data["OBJECT"])
file_hash = hash(file)
# This is the part that is not working as I am unsure how
# to access the task instance to do the xcom_push
ti.xcom_push(key=file_hash, value=file_object)
if ti.xcom_pull(key=file_hash):
return True
else:
return False
and the Operator:
object_result = SimpleHttpOperator(
task_id="object_result",
method='POST',
data=json.dumps({"file": "{{ dag_run.conf['file'] }}", "keyword": "object"}),
http_conn_id="coma_api",
endpoint="/api/v1/file/describe",
headers={"Content-Type": "application/json"},
extra_options={"verify":False},
response_check=lambda response: _handler_object_result(response, "{{ dag_run.conf['file'] }}"),
do_xcom_push=False,
dag=dag,
)
I was really expecting the task_instance object to be available in some fashion, either be default or configuration but each variation that has worked elsewhere (filesensor, pythonOperator, etc) hasn't worked, and been unable to google a solution for the magic words to make it accessible.
You can try using the get_current_context() function in your response_check function:
from airflow.operators.python import get_current_context
def _handler_object_result(response, file):
# Note: related to api I am calling not Airflow internal task ids
header_result = response.json()
task_id = header_result["task"]["id"]
api = "https://redacted.com/api/task/result/{task_id}".format(task_id=task_id)
resp = requests.get(api, verify=False).json()
data = json.loads(resp["data"])
file_object = json.dumps(data["OBJECT"])
file_hash = hash(file)
ti = get_current_context()["ti"] # <- Try this
ti.xcom_push(key=file_hash, value=file_object)
if ti.xcom_pull(key=file_hash):
return True
else:
return False
That function is a nice way of still accessing the task's execution context when context isn't explicitly handy or you don't want to pass context attrs around to access it deep in your logic stack.
I need help understanding how to process a user-supplied token in my FastApi app.
I have a simple app that takes a user-session key, this may be a jwt or not. I will then call a separate API to validate this token and proceed with the request or not.
Where should this key go in the request:
In the Authorization header as a basic token?
In a custom user-session header key/value?
In the request body with the rest of the required information?
I've been playing around with option 2 and have found several ways of doing it:
Using APIKey as described here:
async def create(api_key: APIKey = Depends(validate)):
Declaring it in the function as described in the docs here
async def create(user_session: str = Header(description="The Users session key")): and having a separate Depends in the router config,
The best approach is to build a custom dependency using any one of the already existing authentication dependencies as a reference.
Example:
class APIKeyHeader(APIKeyBase):
def __init__(
self,
*,
name: str,
scheme_name: Optional[str] = None,
description: Optional[str] = None,
auto_error: bool = True
):
self.model: APIKey = APIKey(
**{"in": APIKeyIn.header}, name=name, description=description
)
self.scheme_name = scheme_name or self.__class__.__name__
self.auto_error = auto_error
async def __call__(self, request: Request) -> Optional[str]:
api_key: str = request.headers.get(self.model.name)
# add your logic here, something like the one below
if not api_key:
if self.auto_error:
raise HTTPException(
status_code=HTTP_403_FORBIDDEN, detail="Not authenticated"
)
else:
return None
return api_key
After that, just follow this from documentation to use your dependency.
For an application, I have followed the fastAPI documentation for the authentification process.
By default, OAuth2PasswordBearer raise an HTTPException with status code 401. So, I can't check if an user is actually connected without return a 401 error to the client.
An example of what I want to do:
app = FastAPI()
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="api/users/token")
def get_current_user(token: str = Depends(oauth2_scheme)):
try:
settings = get_settings()
payload = jwt.decode(token, settings.secret_key,
algorithms=[settings.algorithm_hash])
email = payload.get("email")
if email is None:
raise credentials_exception
token_data = TokenData(email=email)
except jwt.JWTError:
raise credentials_exception
user = UserNode.get_node_with_email(token_data.email)
if user is None:
raise credentials_exception
return user
#app.get('/')
def is_connected(user = Depends(get_current_user)
# here, I can't do anything if the user is not connected,
# because an exception is raised in the OAuth2PasswordBearer __call__ method ...
return
I see OAuth2PasswordBearer class have an "auto_error" attribute, which controls if the function returns None or raises an error:
if not authorization or scheme.lower() != "bearer":
if self.auto_error:
raise HTTPException(
status_code=HTTP_401_UNAUTHORIZED,
detail="Not authenticated",
headers={"WWW-Authenticate": "Bearer"},
)
else:
return None
So i think about a workaround:
app = FastAPI()
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="api/users/token", auto_error=False)
def get_current_user(token: str = Depends(oauth2_scheme)):
if not token:
return None
# [ ... same token decoding logic than before ... ]
return user
#app.get('/')
def is_connected(user = Depends(get_current_user)
return user
It works, but I wonder what other ways there are to do this, is there a more "official" method?
This is a good question and as far as I know, there isn't an "official" answer that is universally agreed upon.
The approach I've seen most often in the FastAPI applications that I've reviewed involves creating multiple dependencies for each use case.
While the code works similarly to the example you've provided, the key difference is that it attempts to parse the JWT every time - and doesn't only raise the credentials exception when it does not exist. Make sure the dependency accounts for malformed JWTs, invalid JWTs, etc.
Here's an example adapted to the general structure you've specified:
# ...other code
oauth2_scheme = OAuth2PasswordBearer(
tokenUrl="api/users/token",
auto_error=False
)
auth_service = AuthService() # service responsible for JWT management
async def get_user_from_token(
token: str = Depends(oauth2_scheme),
user_node: UserNode = Depends(get_user_node),
) -> Optional[User]:
try:
email = auth_service.get_email_from_token(
token=token,
secret_key=config.SECRET_KEY
)
user = await user_node.get_node_with_email(email)
return user
except Exception:
# exceptions may include no token, expired JWT, malformed JWT,
# or database errors - either way we ignore them and return None
return None
def get_current_user_required(
user: Optional[User] = Depends(get_user_from_token)
) -> Optional[User]:
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="An authenticated user is required for that action.",
headers={"WWW-Authenticate": "Bearer"},
)
return user
def get_current_user_optional(
user: Optional[User] = Depends(get_user_from_token)
) -> Optional[User]:
return user
I have created an app as following:
X_API_KEY = APIKeyHeader(name='X-API-Key')
def validate_api_key(x_api_key: str = Depends(X_API_KEY)):
if x_api_key == ENV_API_KEY:
return True
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid API Key",
)
app = FastAPI(
title="My boring app",
version=APP_VERSION,
dependencies=[Security(validate_api_key)],
root_path="/api/v1"
)
#app.get("/secretdata")
def secretdata() -> dict:
return 'data'
#app.get("/")
def is_alive() -> dict:
return True
How can I whitelist the '/' path from security (api key)?
One way of achieving this is to split your application into multiple routers as shown in the example for bigger applications in the FastAPI documentation.
Here's an example fitting your case:
# add import
from fastapi import APIRouter
X_API_KEY = APIKeyHeader(name='X-API-Key')
def validate_api_key(x_api_key: str = Depends(X_API_KEY)):
if x_api_key == ENV_API_KEY:
return True
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid API Key",
)
app = FastAPI(
title="My boring app",
version=APP_VERSION,
# removed global dependency
root_path="/api/v1"
)
# generate new routers
protected_router = APIRouter()
unprotected_router = APIRouter()
# use respective router
#protected_router.get("/secretdata")
def secretdata() -> dict:
return 'data'
#unprotected_router.get("/")
def is_alive() -> dict:
return True
# include the routers in the application, and add dependencies where needed
app.include_router(protected_router, dependencies=[Security(validate_api_key)]
# note: no dependency for this one
app.include_router(unprotected_router)
For this to be a bit cleaner you would usually split these routers into separate files, as shown in the previously mentioned documentation!
I am new to Airflow. I have written a code to submit HTTP Post using SimpleHttpOperator. In this case post request return a token, i need a help on how reading the response body.
get_templates = SimpleHttpOperator(
task_id='get_templates',
method='POST',
endpoint='myendpoint',
http_conn_id = 'myconnection',
trigger_rule="all_done",
headers={"Content-Type": "application/json"},
xcom_push=True,
dag=dag
)
Looks like POST was successful. Now my question is how to read the response body.
This is the output of code, there is no errors
[2019-05-06 20:08:40,518] {http_hook.py:128} INFO - Sending 'POST' to url: https://auth.reltio.com/oauth//token?username=perf_api_user&password=perf_api_user!&grant_type=password
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
The execute function of the SimpleHttpOperator returns the response.text (source). By looking at the Airflow documentation for XCom, you can see that:
... if a task returns a value (either from its Operator’s execute() method, or from a PythonOperator’s python_callable function), then an XCom containing that value is automatically pushed.
meaning the response body is pushed to the XCom and is available for downstream tasks to access.
For example, you could have a PythonOperator fetching it via:
response_body = context['task_instance'].xcom_pull(task_ids='get_templates')
Additionally, if you just want to log the response instead of process it, you can just set the log_response of the SimpleHttpOperator constructor to True.
If you use Airflow 2 xcom_push argument is not available in SimpleHttpOperator. In this case let's say you call /my-url in task call_api; to get the response and pass it to another task you need to read from the xcom return_value that is automatically defined by the SimpleHttpOperator as:
call_api = SimpleHttpOperator(
task_id='call_api',
http_conn_id=api_connection,
method='GET',
endpoint='/my-url',
response_filter=lambda response: json.loads(response.text),
log_response=True, # Shows response in the task log
dag=dag
)
def _read_response(ti):
val = ti.xcom_pull(
task_ids='call_api',
key='return_value'
)
print(val)
read_response = PythonOperator(
task_id='read_response',
python_callable=_read_response,
dag=dag
)
You can also specify dag_id in ti.xcom_pull to select the running dag.