why does `airflow connections list` shows unencrypted results? - airflow

Airflow version: 2.1.0
I set FERNET_KEY and checked login/password fields are encrypted when I add connections via Web UI.
However, when I add a connection via CLI:
airflow connections add 'site_account' \
--conn-type 'login' \
--conn-login 'my_user_id' \
--conn-password 'wowpassword'
And run airflow connections list, it shows everything in raw value(not encrypted at all).
I think this could be dangerous enough if I manage all connections using CLI commands (I want to make my airflow infra restorable. That's why I tried to use CLI command to manage connections)
How to solve it?

Airflow decrypts the connections passwords during the processing of your cli commands.
You can use airflow connections list --o yaml to see whether your record was actually encrypted in the database or not.
Furthermore, if you are able to access the cli, you are also able to access the config, meaning you can always extract the database connection and fernet_key and get the full password on your own.

Jorrick answer is correct however I want to elaborate on the background as I feel it will bridge between the question and the answer.
It's very understandable that Airflow needs to be able to decrypt the connection when DAG/user asks to. This is needed for normal operation of the app so Airflow must assume that if a user can author DAGs he is permitted to utilize the system resources (Connections, Variables).
The security measurements are on a different level. If utilizing them (using Fernet) then Airflow will encrypt the sensitive information (like connection passwords) this means that in the database itself the value is encrypted. The security concern here is where the ferent_key is stored? is it rotating? etc...
There are many other security layers that handle different aspects like: access control, hiding sensitive information in the Ui but that's a different topic.
I think the important thing to understand that security handles two types of users:
A user that is permitted but you just want to limit what actions he can do or what he can see. (This is more what Airflow itself handles see security docs)
A user that is malicious and wants to do damage. While Airflow does provide some features in that area this is more of an issue of where you setup Airflow and how well you protect it (IP allow-list etc...)
keep in mind that if a malicious user gained access to Airflow server then there is little you can do about it. This user can simply use his admin privileges to do anything. This is no different than a user that hacked into any other server that you own.

Related

MultiTennant Airflow - access control and secrets management

Any recommendations on how to approach secrets management on a multi-tenant Airflow 2.0 instance?
We are aware of an alternative backend which would be used before envt variables and multistore, configured via airflow.cfg.
But how do we ensure security around this in a multi-tenant envt e.g. how can we restrict users and their DAG's access to secrets/connections. Is it possible to use access control to restrict this?
We're envisaging putting Connection Ids inside DAGs and, from what we understand, anybody who has knowledge of them will be able to access the Connection and extract secrets as long as he is able to create DAGs of his own. How can we prevent this?
There is no way currently (Airflow 2.1) to prevent anyone who can write DAGs to be able to access anything in the instance. Airflow does not (yet) have true multi-tenant setup that provides this kind of isolation. This is in the works but it will likely not come (fully) until Airflow 3 but elements of it will appear in Airlfow 2 in the coming months so you will be able to configure more and more isolation if you want likely.
For now Airflow 2 introduced partial isolation comparing to 1.10:
Parsing the DAGs is separated from Scheduler, so erroneous/malicious DAGs cannot impact scheduling process directly.
Webserver does not execute DAG code any more at all.
Currently, whoever writes DAGs can:
access the DB of Airflow directly and do anything in the database (including dropping the whole database)
read any configuration variables and connections and secrets
dynamically change definition of any DAGS/Tasks runnning in Airflow via manipulating the DB
And there is no way to prevent it (by design).
All those, are in plans to address in the coming months.
This basically means that you have to have certain level of trust for the users who are writing the DAGs. Full isolation cannot be achieved, you should rely on code reviews of the submitted DAG in production to be able to prevent any kind of abuse (very similar as in case of any code submitted by developers to your code-base).
The only "true" isolation currently you can achieve by deploying several Airlfow instances - each with own database, scheduler, webserver. This is actually not as bad as it seems - if you have Kubernetes Cluster and use the official Helm Chart of Airflow https://airflow.apache.org/docs/helm-chart/stable/index.html. You can easily create several Airflow instances - each in a different namespace, and each using their own database schema (so you can still use single Database Server, but each instance will have to have their own separate schema). Each airflow instance will then have their own workers which can have different authentication (either via connections or via other mechanisms).
You can even provide common authentication mechanisms - for example you could put KeyCloak in front of Airflow and integrate Oauth/LDAP authentication with your common auth aproach - for all such instances (and for example have different groups of employees authorized for different instances).
This provides nice multi-tenant manageability, some level of resource re-use (database, K8S cluster nodes), and if you have - for example - Terraform scripts to manage your infrastructure, this can be actually nicely made easily manageable so that you can add/remove tenants easily. And the isolation between tenants is even better - because you can separately manage resources used (number of workers, schedulers etc.) for each tenant.
If you are serious about isolation and multi-tenant management, I heartily recommend that approach. Even when in Airflow 3 we will achieve full isolation, you will still have to make sure to manage the "resource" isolation between tenants and having multiple Airflow Instances is one way that makes it very easy (so it will also remain as valid and recommended way of implementing multi-tenancy in some scenarios).

How to achieve that user is also the author of a task in Phabricator's Maniphest via Conduit API?

The Conduit API in Phabricator does not support setting of authorPHID parameter while calling maniphest.createtask. I can imagine this is because of security or some logical reason.
But I am developping my own frontend for Maniphest where the users (logged through Phabricator, so they are phab users and have phid) will add and edit tasks. What I need is that if a user creates task, he is also the author of the task.
But the problem is, that I can't connect to Conduit as any other user than "apibot" because I don't have others certificates in my front-end to do it. But if I log in as "apibot", then "apibot" is set as an author of the task.
Three possible solutions came to my mind:
1. retrieve certificate directly from phab's database
2. keep a list of certificates in some file in my front-end and update it manually everytime somebody will register
I guess none of them are really smart...
The third solution would be nice, but I didn't find a way, how to do it:
3. log in as "apibot", get certificate of userXY and then log in as the userXY
What would you suggest?

How can I use cctrlapp without constantly entering credentials?

I've started playing experimenting with cloudcontrol.com. They provide a cli application called cctrlapp for managing projects.
However, many useful operations require a login. It is cumbersome and frustrating to have to put in my email address and password every time I push the current state of my app.
Can ccrtlapp be configured to use stored credentials?
Recommended: We now support authentication via public-keys between CLI and API which is more secure and more convenient. To set it up, simply run:
$ cctrluser setup
Read more about this here: http://www.paasfinder.com/introducing-public-key-authentication-for-cli-and-api/
Alternatively: You can set your credentials via the 'CCTRL_EMAIL' and 'CCTRL_PASSWORD' environment variables. If set, they're automatically used by cctrlapp.

Share encrypted web.config between developers

Here at the job, we're working on an ASP.NET MVC application for a proof of concept. Some of the operations that the application performs require transmission of credentials, so we're storing those creds in an encrypted section of the web.config. The difficulty we're having is that when one developer encrypts the data and commits it, the next developer who updates his local copy and tries to use that web.config gets exceptions because their machine can't decrypt the config for use.
How ought we to handle this?
In the past, I've used the machine.config for sensitive credentials i.e. connection strings and such. It's located at C:\Windows\Microsoft.Net\Framework\V4.0.30319\Config
This will allow you to omit the credentials out of commits altogether. Just make sure each developer and/or server has its own machine.config with the required credential settings.
I'm assuming that you are using aspnet_regiis.exe to encrypt the section. If this is the case the reason that you are having problems is that the keys used for encryption/decryption are different on the machines.
You can either use the same keys on all of the machines, from a configuration perspective this would be similar to a farm setup so you can use the information in this SO question.
Alternatively since there's an inherent assumption that the developers have access to the credentials leave it decrypted until the app is deployed to the production server and then encrypt that section. This is a common solution when the username/password are specified in the web.config as part of the connection string for database connections, the connection would be updated to reflect the production DB server as part of the deployment process just prior to encryption.
In first place not sure why you chose this option when there are other much better ways to handle keys secrets in DevOps best practices. That seems classic way. Also during debug time any developer can peek into actual value or spit out in log?
Anyways, if you put entire dilivery life cycle as context to this problem may here what I would do to achieve what you are trying to protect keys secrets:
Do not store anything even encrypted keys secrets which team doesn't need to run locally except dev or local environment.
In web.config have local or remote keys secrets
In release transformation, clean up all keys secrets to accidental use to environmental
Use release time variable replacement pretty common on any deployment tool to choose e.g Azure/TFS DevOps deployment support it with many different ways - deploy definition level, stage level, library variable or even better key valut store with software+hardware encryption options
Hope this helps in your design approach at least.

How would you implement database updates via email?

I'm building a public website which has its own domain name with pop/smtp mail services. I'm considering giving users the option to update their data via email - something similar to the functionality found in Flickr or Blogger where you email posts to a special email address. The email data is then processed and stored in the underlying database for the website.
I'm using ASP.NET and SQL Server and using a shared hosting service. Any ideas how one would implement this, or if it's even possible using shared hosting?
Thanks
For starters you need to have hosting that allows you to create a catch-all mailbox.
Secondly you need a good POP3 or IMAP library, which is not included AFAIK in the .NET stack.
Then you would write a Command Line application or a Service that regularly checks the mailbox, pulls messages, inserts content in db based on the "To" address (which is unique for each user), and then deletes the email from the mailbox.
It's feasible and sounds like fun. Just make sure you have all you need before you start!
If the data is somewhat "critical", or at least moderately important, do NOT use their username as the "change-data-address". Example: You might be tempted to create an address like username#domain.com, but instead use username-randomnumer#domain.com where you give them the random number if the visit the web-page. That way people can not update other peoples data just by knowing their username.
E-mails can be trivially forged. I would only do this if you can process PGP / SMime certificates in your application.
Other than that, I see no reason why not!
use a dotnet popclient to read the incoming emails, parse them for whatever you are expecting and insert the data into the database.
see codeproject website for simple popclient implementation
you would have to decided on the email content yourself, eg data only, payload of sql statements, etc
You could also identify the user based on sender address. This is how Tripit (and probably others) does it. This only requires one e-mail address on your end.
I have done something similar, using Lumisoft's IMAP client and scheduling a task in my app that checks every x minutes the configured mail address for updates. For scheduling I recommend quartz.net. No launching external processes or anything.

Resources