I configured Airflow (v 2.2.2) DAG to read variables from secrets backend. Variables are used by BashOperator via jinja template.
The problem is that I can't make variable masking to work using var.json syntax.
The following example works for me and password is masked in Airflow logs:
BashOperator(
task_id='my_id',
bash_command="some-command --user={{ var.value.MY_USER }} -- password={{ var.json.MY_PASSWORD }}",
...
)
Now I want to store username and password in the same variable. Credentials are retrieved, but unfortunately password is not masked in logs.
BashOperator(
task_id='my_id',
bash_command="some-command --user={{ var.json.VARIABLE_NAME_SECRET.user }} -- password={{ var.json.VARIABLE_NAME_SECRET.password }}",
...
)
Is it possible to mask value in dictionary when variable is retrieved already json-deserialized?
Maybe there is a bug in the masking method, instead, you can store your username and password in an airflow connection, then access them with the same method:
BashOperator(
task_id='my_id',
bash_command="some-command --user={{ conn.connection_id.login }} -- password={{ conn.connection_id.password }}",
...
)
Related
I'm currently experimenting with a new concept where the operator will communicate with an external service to run the operator instead of running the operator locally, and the external service can communicate with Airflow to update the progress of the DAG.
For example, let's say we have a bash operator:
bash_task = BashOperator(
task_id="bash_task",
bash_command="echo \"This Message Shouldn't Run Locally on Airflow\"",
)
That is part of a DAG:
from airflow import DAG
from airflow.operators.bash import BashOperator
with DAG() as dag:
t1 = BashOperator(
task_id="bash_task1",
bash_command="echo \"t1:This Message Shouldn't Run Locally on Airflow\""
)
t2 = BashOperator(
task_id="bash_task2",
bash_command="echo \"t2:This Message Shouldn't Run Locally on Airflow\""
)
t1 >> t2
Is there a method in the Airflow code that will allow an external service to tell the DAG that t1 has started/completed and that t2 has started/completed, without actually running the DAG on the Airflow instance?
Airflow has a concept of Executors which are responsible for scheduling tasks, occasionally via or on external services - such as Kubernetes, Dask, or a Celery cluster.
https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html
The worker process communicates back to Airflow, often via the Metadata DB about the progress of the task.
I am not very sure how to trace an existing project written in YAML for networking devices.
I have setup the system correctly and its executing all the tasks perfectly. But I want to check what all data are being assigned.
Is there a way to trace ansible just like python?
Ex: In python, I can use ipdb module or just use print() statement to see all kind of things.
Ansible provides a Playbook Debugger, which can be used to trace execution of tasks.
If you want to debug everything in a play, you can pass debugger: always
- name: some play
hosts: all
debugger: always
tasks: ...
Then you can use c command to continue to the next task, p task_vars to see variables or p result._result to see the result.
Debugger can be used on a task or a role level too like this:
- hosts: all
roles:
- role: dj-wasabi.zabbix-agent
debugger: always
It helps to not to pollute your roles with debug tasks, while limiting the scope of debugging.
The other method is to use debug module, which is similar to using print statements in python. You can use in your tasks like this:
# Example that prints the loopback address and gateway for each host
- debug:
msg: System {{ inventory_hostname }} has uuid {{ ansible_product_uuid }}
- debug:
msg: System {{ inventory_hostname }} has gateway {{ ansible_default_ipv4.gateway }}
when: ansible_default_ipv4.gateway is defined
# Example that prints return information from the previous task
- shell: /usr/bin/uptime
register: result
- debug:
var: result
verbosity: 2
I recently upgraded from airflow 1.9 to 1.10 and performed the following commands:
airflow upgradedb
changed all my celery config names mentioned here
export SLUGIFY_USES_TEXT_UNIDECODE=yes
added: log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ execution_date.strftime("%%Y-%%m-%%dT%%H:%%M:%%S") }}/{{ try_number }}.log to my config
Jobs seem to be running fine, but when I click logs don't appear in DAG task nodes.
I opened my network tab and a request to the following url is returning this JSON
$AIRFLOW_URL/ariflow/get_logs_with_metadata?dag_id=xxxx&task_id=xxxxx&execution_date=2018-09-09T23%3A03%3A10.585986%2B00%3A00&try_number=1&metadata=null
{"error":true,"message":["Task log handler file.task does not support read logs.\n'NoneType' object has no attribute 'read'\n"],"metadata":{"end_of_log":true}}
Additionally there is a 404 request to get js/form-1.0.0.js. Any advice on extra steps to get logs reworking?
I can confirm that logs are showing up in the logs directory for tasks on the airflow server.
Using https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg
I previously had
task_log_reader = file.task
and changed it to:
task_log_reader = task
As well I added:
log_filename_template = {{ ti.dag_id }}}}/{{ ti.task_id }}/{{ ts }}}}/{{ try_number }}.log
log_processor_filename_template = {{ filename }}.log
I'm trying to execute the redis-trib.rb utility (used to configure redis clusters) on the salt orchestrator, after executing salt states on multiple minions to bring up redis processes.
Reading the salt documentation it looks like the the orchestrate runner does what I want to execute the minion states.
Indeed this snippet functions perfectly when executed with sudo salt-run state.orchestrate orch.redis_cluster:
redis_cluster_instances_create:
salt.state:
- tgt: '*redis*'
- highstate: True
The problem is with the next step, which requires me to call redis-trib.rb on the orchestrator. Reading through the documentation it looks like I need to use the salt.runner state (executes another runner), to call the salt.cmd runner (executes a salt state locally), which in turn calls the cmd.run state to actually execute the command.
What I have looks like this:
redis_cluster_setup_masters_{{ cluster }}:
salt.runner:
- name: salt.cmd
- fun: cmd.run
- args:
- srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb create {% for shard in shards %}{{ shard['master'] }} {% endfor %}
- kwargs:
unless: srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb info {{ shards[0]['master'] }} | grep 'cluster_state:ok'
- require:
- salt: redis_cluster_instances_create
But it doesn't work, and salt errors out with:
lab-orchestrator_master:
----------
ID: redis_cluster_instances_create
Function: salt.state
Result: True
Comment: States ran successfully. No changes made to lab-redis04, lab-redis01, lab-redis02, lab-redis03.
Started: 09:54:57.811313
Duration: 14223.204 ms
Changes:
----------
ID: redis_cluster_setup_masters_pdnocg
Function: salt.runner
Name: salt.cmd
Result: False
Comment: Exception occurred in runner salt.cmd: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/salt/client/mixins.py", line 392, in _low
data['return'] = self.functions[fun](*args, **kwargs)
TypeError: cmd() takes at least 1 argument (0 given)
Started: 09:55:12.034716
Duration: 1668.345 ms
Changes:
Can anyone suggest what i'm doing wrong? Or an alternative way of executing commands locally on the orchestrator?
The problem is, that you are passing fun et. al. to the runner instead of the execution module. Also note that you have to pass the arguments via arg and not args:
redis_cluster_setup_masters_{{ cluster }}:
salt.runner:
- name: salt.cmd
- arg:
- fun=cmd.run
- cmd='srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb create {% for shard in shards %}{{ shard['master'] }} {% endfor %}'
- unless: srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb info {{ shards[0]['master'] }} | grep 'cluster_state:ok'
- require:
- salt: redis_cluster_instances_create
That should do the trick, though I haven't tested it with the kwargs parameter.
Suppose I have different credentials in two different environments, but that's the only thing that differs between them, and I don't want to make extra pillar files for a single item.
Suppose I attack the problem like this:
{%- set deployment = grains.get('deployment') %}
{%- load_yaml as credentials %}
prod: prodpassword
test: testpassword
dev: devpassword
{%- endload %}
some_app:
user: someuser
password: {{ credentials[deployment] }}
...more configuration here...
This works as expected. But can a minion in test theoretically get the password for prod? That depends on whether the dict lookup happens before or after data is sent to the client, I think, which in turn depends on when the jinja is rendered. Does the master render it first and then send the resulting data, or does the minion receive the pillar file as-is, then render it itself?
Pillar data is always rendered on the master, never the minion. The master does have access to the minion's grains, however, which is why your example works.
Given a Pillar SLS file with the following contents:
test: {{ grains['id'] }}
The following pillar data will result:
# salt testminion pillar.item test
testminion:
----------
test:
testminion
Source: I'm a SaltStack core developer.