Authenticate Google Composer http call task with IAP protected app - airflow

I have a setup where I have an app engine REST application and a Google composer / airflow DAG that has a task where it is supposed to fetch data from one of the endpoints of the app. The app is protected by IAP. I have added the service account under which Airflow runs to the "IAP-secured Web App User" list, however each time the step executes the response to the http call is the Google Sign-In page. Any idea if any additional step is needed?
The code for my DAG step:
def get_data():
r = requests.get(url="url-to-my-app-endpoint>")
print('stuff:')
print(r.status_code)
print(r.content)
return 1
# ...
python_fetch_data = PythonOperator(
task_id='python_fetch_data',
python_callable=get_data,
dag=dag,
depends_on_past=True,
priority_weight=2
)

https://cloud.google.com/iap/docs/authentication-howto#authenticating_from_a_service_account explains how to extend your DAG code so that it sends credentials to the IAP-protected API backend.
A bit of background: Since Composer is built on top of GCP, your Composer deployment has a unique service account identity that it's running as. You can add that service account to the IAP access list for your endpoint.
I don't know if the Composer UI makes it easy to see the "email" address for your service account, but if you add the code above and decode the token it generates, that will show it.

Related

Unable to access Airflow REST API

I have setup airflow in my local machine. I am trying to access the below airflow link:
http://localhost:8080/api/experimental/test/
I am getting Airflow 404 = lots of circles
I have tried to set auth_backend to default, but no luck.
What changes do i need to make in airflow.cfg to be able to make REST API calls to airflow for triggering DAGs?
Experimental API is disabled by default in Airlfow 2. It was used in 1.10 but it has been deprecated and disabled by default in Airflow 2. Instead you should use the fully-fledged REST API which uses completely different URL scheme:
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html
In Airflow UI you can even browse and try the API (just look at the menus of Airflow).

cacellation token enabled after a while

i have an micro service application who has many projects
i called service who need three other service project to complete task
when i called the method it enable cancellation token and i do not know what is the error
it works perfectly on debug mode but when i published it on server it shows me
The operation was canceled.
the method is async and task operation base
increase time out on iis did not work
Task.Run()=>(mytask()).(SetMyTimout); didn't work
await mymethod().wait(timeout); didn't work

Google Cloud Task not calling http requests

I have created a google cloud task and the queue keeps retrying and the function is not getting invoked as well.
This is the log from the cloud console.
attemptResponseLog: {
attemptDuration: "0.133874s"
dispatchCount: "19"
maxAttempts: 0
responseCount: "0"
retryTime: "2020-06-21T21:20:18.518655Z"
scheduleTime: "2020-06-21T21:20:15.718098Z"
status: "UNAVAILABLE"
targetAddress: "POST some url"
targetType: "HTTP"
}
I ran into this same error, and I must say that the documentation is not clear enough.
WARNING : I feel there's a bit of latency for the roles to be taking into account, especially with the ServiceAccountUser one.
I made multiple test, and tried to keep the lowest rights possible, so I did try to remove some... do some test, it works... great, it's not necessary to have this right... came back later, and the thing is broken.
Here is my setup :
I use Cloud Scheduler to trigger a Cloud Function every 15 minutes by posting a message on a queue.
The CloudFunction build a list of tasks to compute stats on MySQL and create the tasks
Another Cloud Function run SQL query to get stats and store the results in FireStore.
I use cloud task so that the load on MySQL is not too heavy.
Below, I use functional names to make it easy to understand.
TaskCreatorCloudFunction running with TaskCreatorServiceAccount
TaskCreatorServiceAccount requires
the "Cloud Task Enqueuer" role #1
be a ServiceAccountUser on the CloudTaskComputeStatsServiceAccount (see after) #2
The Roles needed to do the job(read SQL to get the list of tasks to create, write logs, access secret manager, listen to pubsub as it's triggered by Cloud Scheduler via pubsub)
TaskImplementationCloudFunction (http) running with TaskImplementationServiceAccount
TaskImplementationServiceAccount has no specific role for CloudTasks, only the one needed to do the job (read SQL, write logs, access secret manager, firestore write)
The TaskQueue is named "compute-stats-on-mysql".
I've created a dedicated ServiceAccount named CloudTaskComputeStatsServiceAccount #3
CloudTaskComputeStatsServiceAccount has the specifics rights for the whole thing to work.
Cloud Function Invoker #4
Add CloudTaskComputeStatsServiceAccount as ServiceAccountUser on TaskImplementationServiceAccount #5
To do the last one in the console (script version below), you need to
go to IAM->Service Account
check the TaskImplementationServiceAccount
In the upper right corner, click "Show Info Panel" if it's not already displayed
click the Add Member
Paste the full name of the CloudTaskComputeStatsServiceAccount
Choose Service Account User as role
Save
You can edit this in the console, but it's better to script it.
gcloud tasks queues create compute-stats-on-mysql \
--max-dispatches-per-second=10 \
--max-concurrent-dispatches=15 \
--max-attempts=2 \
--min-backoff=1s
#3
gcloud iam service-accounts create CloudTaskComputeStatsServiceAccount --description="Service Account for the cloud task compute-stats-on-mysql" --display-name="Service Account for the cloud task compute-stats-on-mysql"
#4
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member serviceAccount:CloudTaskComputeStatsServiceAccount#${PROJECT_ID}.iam.gserviceaccount.com --role "roles/cloudfunctions.invoker"
#1
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member serviceAccount:TaskCreatorServiceAccount#${PROJECT_ID}.iam.gserviceaccount.com --role "roles/cloudtasks.enqueuer"
#5
gcloud iam service-accounts add-iam-policy-binding TaskImplementationServiceAccount#${PROJECT_ID}.iam.gserviceaccount.com --member="serviceAccount:CloudTaskComputeStatsServiceAccount#${PROJECT_ID}.iam.gserviceaccount.com" --role "roles/iam.serviceAccountUser"
#2
gcloud iam service-accounts add-iam-policy-binding CloudTaskComputeStatsServiceAccount#${PROJECT_ID}.iam.gserviceaccount.com --member="serviceAccount:TaskCreatorServiceAccount#${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/iam.serviceAccountUser
when Creating the Task, you use the CloudTaskComputeStatsServiceAccount in the oidcToken
const body = Buffer.from(JSON.stringify(data)).toString('base64');
const task = {
httpRequest: {
httpMethod: 'POST',
url,
oidcToken: {
serviceAccountEmail: "CloudTaskComputeStatsServiceAccount#${PROJECT_ID}.iam.gserviceaccount.com",
},
headers: {
'Content-Type': 'application/json',
},
body,
},
};
My understanding is that when you run the
const [response] = await cloudTasksClient.createTask({parent, task});
The Cloud Function (Task Creator) need to Create Task, and act as the "CloudTaskComputeStatsServiceAccount"
And "CloudTaskComputeStatsServiceAccount" need to have the cloud function invoker and act as the target cloud function.
Indeed it's not a service account issue. It's the OIDC token audience missing. Seems that for cloud functions this is needed. I found two references... you can recreate this problem with the OIDC token in the cli by omitting this argument to gcloud tasks create-http-task
--oidc-token-audience=OIDC_TOKEN_AUDIENCE
The audience to be used when generating an OpenID Connect token to be
included in the request sent to the target when executing the task.
If not specified, the URI specified in the target will be used
The second reference that popped up, in ruby shows audience
https://googleapis.dev/ruby/google-cloud-tasks/latest/Google/Cloud/Tasks/V2/OidcToken.html
Code using google-cloud-tasks 1.5.0, the tasks object looks like this, where url_oidc has just the url to the cloud function (i.e. the trigger url... no url parameters)
# Construct the request body.
task = {
'http_request': { # Specify the type of request.
'http_method': 'GET',
'url': url_final, # The full url path that the task will be sent to.
'oidc_token': {
'service_account_email': service_account_email,
'audience': url_oidc
}
}
}

Control-m batch job is spanning mutliple versions of a singleton ActiveEx server

as part of a batch job I create 4 command lines through control-m which invoke a legacy console application written in VB6. The console application invokes an ActiveEx server which performs a set of analytic jobs calculating outputs. The ActiveEx server was coded as a singleton but when invoked through control-m I get 4 instances running. the ActiveEx server does not tear down once the job has completed and the command line has closed it self.
I created 4 .bat files which once launced manually on the server, simulate the calls made through control-m and the ActiveEx server behaves as expected, i.e. there is only 1 instance ever running and once complete it closes down gracefully.
What am I doing wrong?
Control-M jobs are run under a service account and it same as we login as a user and execute a job. How did you test this? Did you manually executed each batch job one after another or you have executed all the batch job at the same time from different terminals? You can do one thing. Run the control-M jobs with a time interval like first one at 09.00 second one at 09.05, third one at 09.10 and forth one at 09.15 and see if that fix your issue.
Maybe your job cannot use the Desktop environment.
Check your agent service settings:
Log on As:
User account under which Control‑M Agent service will run.
Valid values:
Local System Account – Service logs on as the system account.
Allow Service to Interact with Desktop – This option is valid only if the service is running as a local system account.
Selected – the service provides a user interface on a desktop that can
be used by whoever is logged in when the service is started. Default.
Unselected – the service does not provide a user interface.
This Account – User account under which Control‑M Agent service will run.
NOTE: If the owner of any Control-M/Server jobs has a "roaming profile" or if job output (OUTPUT) will be copied to or from other computers, the Log in mode must be set to This Account.
Default: Local System Account

Can Cloud Functions for Firebase be used across projects?

I was hoping to trigger a Pub/Sub function (using functions.pubsub / onPublish) whenever a new Pub/Sub message is sent to a topic/subscription in a third-party project i.e. cross projects.
After some research and experimentation I found that TopicBuilder throws an error if the topic name contains a / and it defaults to "projects/" + process.env.GCLOUD_PROJECT + "/topics/" + topic (https://github.com/firebase/firebase-functions/blob/master/src/providers/pubsub.ts).
I also found a post in Stack Overflow that says that "Firebase provides a (relatively thin) wrapper around Google Cloud Functions"
(What is the difference between Cloud Function and Firebase Functions?)
This led me to look into Google Cloud Functions. Whilst I was able to create a subscription in a project I own to a topic in a third-party project - after changing permissions in IAM - I could not find a way associate a function with the topic. Nor was I successful in associating a function with a topic and subscription in a third-party project. In the console I only see the topics in my project and I had no success using gcloud.
Has anyone had any success in using a function across projects and, if so, how did you achieve this and is there a documentation URL you could provide? If a function can't be triggered by a message to a topic and subscription in a third-party project can you think of a way that I could ingest third-party Pub/Sub data?
As Pub/Sub fees are billed to the project that contains the subscription I would prefer that the subscription resides in the third-party project with the topic.
Thank you
Google Cloud Functions currently does not not allow a function to listen to a resource in another project. For Cloud Pub/Sub triggers specifically you could get around this by deploying an HTTP-function and add a Pub/Sub push subscription to the topic that you want to fire that cross-project function.
A Google Cloud Function can't be triggered by a subsription to a topic of another project (since you can't subscribe to another project's topic).
But a Google Cloud Function can publish to a topic of another project (and then subscribers of this topic will be triggered).
I solved it by establishing a Google Cloud Function in the original project which listens to the original topic and reacts with publishing to a new topic in the other project. Therefore, the service account (...#appspot.gserviceaccount.com) of this function "in the middle" needs to be authorized by the new topic (console.cloud.google.com/cloudpubsub/topic/detail/...?project=...), i.e. add principal with role: "Pub/Sub Publisher"
import base64
import json
import os
from google.cloud import pubsub_v1
#https://cloud.google.com/functions/docs/calling/pubsub?hl=en#publishing_a_message_from_within_a_function
# Instantiates a Pub/Sub client
publisher = pubsub_v1.PublisherClient()
def notify(event, context):
project_id = os.environ.get('project_id')
topic_name = os.environ.get('topic_name')
# References an existing topic
topic_path = publisher.topic_path(project_id, topic_name)
message_json = json.dumps({
'data': {'message': 'here would be the message'}, # or you can pass the message of event/context
})
message_bytes = message_json.encode('utf-8')
# Publishes a message
try:
publish_future = publisher.publish(topic_path, data=message_bytes)
publish_future.result() # Verify the publish succeeded
return 'Message published.'
except Exception as e:
print(e)
return (e, 500)
google endpoints can be a easier solution to add auth to the function http.

Resources