How to capture user who triggered Airflow DAG - airflow

I've enabled RBAC as environment variable in docker-compose file.
- AIRFLOW__WEBSERVER__RBAC=True
I want to capture the user who kicked off a dag inside my dag files.
I tried using from flask_login import current_user. But, I get the value of current_user as None.
May I know how to capture user details using RBAC?

According to Airflow documentation as part of RBAC security model that is handled by Flask AppBuilder (FAB):
Airflow uses flask_login and exposes a set of hooks in the
airflow.default_login module. You can alter the content and make it
part of the PYTHONPATH and configure it as a backend in
airflow.cfg.
Flask-login module provides user management operations, thus you can fetch current user within a dedicated property flask_login.current_user, adding
some extra fields, as described in #3438 pull request:
if current_user and hasattr(current_user, 'user'):
user = current_user.user.username
elif current_user and hasattr(current_user, 'username'):
I suppose that you can use current_user.user.username to fetch a user login.

Related

Airflow incorrectly always picks the built in default credentials under Admin-> Connections

I am trying to connect to mysql database using MySqlHook. Under Admin -> Connections I have defined a new connection of the type mysql with the name myappname_db. I have used this in my code as drupalHook = MySqlHook(conn_name_attr='myappname_db')
However, when I run the dag in local, I see that it picks up the built in default credential from Admin -> Credentials i.e. mysql_default instead of myappname_db
To rectify this do I need to update any setting in airflow.cfg or any other config.
Thanks.
The MySqlHook uses an attribute mysql_conn_id to store the connection id.
conn_name_attr refers to the name of the attribute (mysql_conn_id in this case), which is fetched dynamically in the MySqlHook.

How do I access user configuration inside a dag file for airflow?

I want to pass the configuration which will be submitted through web ui to DatabricksSubmitRunOperator's notebook_task parameter.
I know how to do this in python operator but not in databrick operator.

Airflow logs in s3 bucket

Would like to write the airflow logs to s3. Following are the parameter that we need to set according to the doc-
remote_logging = True
remote_base_log_folder =
remote_log_conn_id =
If Airflow is running in AWS, why do I have to pass the AWS keys? Shouldn't the boto3 API be able to write/read to s3 if correct permission are set on IAM role attached to the instance?
Fair point, but I think it allows for more flexibility if Airflow is not running on AWS or if you want to use a specific set of credentials rather than give the entire instance access. It might have also been easier implementation as well because the underlying code for writing logs into S3 uses the S3Hook (https://github.com/apache/airflow/blob/1.10.9/airflow/utils/log/s3_task_handler.py#L47), which requires a connection id.

How to access a "plone site object" without context?

I have a scheduled job (i'm using apscheduler.scheduler lib) that needs access to the plone site object, but I don't have the context in this case. I subscribed IProcessingStart event, but unfortunately getSite() function returns None.
Also, is there a programmatic way to obtain a specific Plone Site from Zope Server root?
Additional info:
I have a job like this:
from zope.site import hooks
sched = Scheduler()
#sched.cron_schedule(day_of_week="*", hour="9", minute="0")
def myjob():
site = hooks.getSite()
print site
print site.absolute_url()
catalogtool = getToolByName(site, "portal_catalog")
print catalogtool
The site variable is always None inside a APScheduler job. And we need informations about the site to run correctly the job.
We have avoided to execute using a public URL because an user could execute the job directly.
Build a context first with setSite(), and perhaps a request object:
from zope.app.component.hooks import setSite
from Testing.makerequest import makerequest
app = makerequest(app)
site = app[site_id]
setSite(site)
This does require that you open a ZODB connection yourself and traverse to the site object yourself.
However, it is not clear how you are accessing the Plone site from your scheduler. Instead of running a full new Zope process, consider calling a URL from your scheduling job. If you integrated APScheduler into your Zope process, you'd have to create a new ZODB connection in the job, traverse to the Plone site from the root, and use the above method to set up the site hooks (needed for a lot of local components anyway).

Windows Workflow - Creating a reusable task list (bookmarks?)

I'm looking at migrating business processes into Windows Workflow, the client app will be ASP/MVC and the workflows are likely to be hosted via IIS.
I want to create a common 'simple task' activity which can be used across multiple workflows. Activity properties would look something like this:
Related customer
Assigned agent
Prompt ("Please review PO #12345")
Text for 'true' button ("Accept")
Text for 'false' button ("Reject")
Variable to store result in
Once the workflow hits this activity a task should be put into a db table. The web app will query the table and show the agent a list of tasks they need to complete. Once they hit accept / reject the workflow needs to resume.
It's the last bit that I'm stuck on. What do I need to store in the DB table to resume a workflow? Given that the tasks table will be used by multiple workflows how work I instantiate the workflow to resume it? I've looked at bookmarks but they assume that you know the type of workflow that you're resuming. Do I need to use reflection or is there a method in WF where I can pass a workflow id and it will instantiate it?
You can use workflow service and control its via ControlEndPoint.
For more info about controlendpoint you can refer at
http://msdn.microsoft.com/en-us/library/ee358723.aspx

Resources