Airflow DAG dynamic structure - airflow

I was looking for a solution where I can decide the dag structure when the dag is triggered as I'm not sure about the number of operators that I'll have to run.
Please refer below for the execution sequence that I'm planning to create.
|-- Task B.1 --| |-- Task C.1 --|
|-- Task B.2 --| |-- Task C.2 --|
Task A --|-- Task B.3 --|---> Task B ---> |-- Task C.3 --|
| .... | | .... |
|-- Task B.N --| |-- Task C.N --|
I'm not sure about the value of N.
Is this possible in airflow. If so, how do I achieve this.
Thanks in Advance

I had to do something similar in the past, I wrote a DAG which read from a YAML file which defined what tasks to create.
My situation was that the number of tables that I was extracting data from could change every week, instead of re-deploying the DAG to production every time I needed to add a new table I pointed the DAG to a YAML file which described which tables to extract. Every time a new table came along I would simply edit the YAML file with the new table details.
I think it gets a bit trickier if an upstream task needs to be run first which then determines how many downstream tasks to run like in the following - but similar - question:
Generating dynamic tasks in airflow based on output of an upstream task

Related

Sbt in project plugin, how to structure them?

We do have a custom plugin as a single file in our project folder:
acme-project
|- ...
|- project
| |- CustomPlugin.scala
object CustomPlugin extends AutoPlugin {
// ...
That was simple and easy until that plugin started to grow...
In the first step, we added more classes/objects to the same source file. However, it continues to grow and I would like to add more structure via packages.
acme-project
|- ...
|- project
| |- CustomPlugin.scala
| |- SupportingClass.scala
| |- acme
| | |- plugin
| | | |- PackagedClass.scala
My CustomPlugin seems to be able to use the SupportingClass from the folder, whenever this class declare another package. However, I cannot use the PackagedClass :
[error] /~/acme-project/project/CustomPlugin.scala:1:8: not found: object acme
[error] import mega.plugin.PackagedClass
[error]             ^
I tried to add one src/main/scala folder but have the same kind of import errors.
So, I would like to know if there are ways to create large/structured plugins inside a project without making a complete one?
I would like to keep the simplicity of this format where I do not have to publish my plugin. Having it inside a dedicated module would be ok.
Thanks
One way to do this is to restructure your plugin as its own sbt project, then load it as a module dependency in your project's plugins.sbt.
Move the code to its own directory. This can be anywhere you like, e.g.:
acme-project/project/my-custom-plugin/
acme-project/my-custom-plugin/
some-other-path/my-custom-plugin/
Write a my-custom-plugin/build.sbt with at least these settings:
enablePlugins(SbtPlugin)
sbtPlugin := true
scalaVersion := "2.12.16"
name := "my-custom-plugin"
Add it to your project as a module dependency in acme-project/project/plugins.sbt:
// For acme-project/project/my-custom-plugin/
val myCustomPlugin =
project in file("my-custom-plugin")
// For acme-project/my-custom-plugin/
val myCustomPlugin =
ProjectRef(file("../../my-custom-plugin"), "my-custom-plugin")
// For some-other-path/my-custom-plugin/
val myCustomPlugin =
ProjectRef(file("/path/to/my-custom-plugin", "my-custom-plugin")
dependsOn(myCustomPlugin)
You only need one of those val myCustomPlugin = lines, depending on where you put your my-custom-plugin/ directory.

dynamic task id names in Airflow

I have a DAG with one DataflowTemplateOperator that can deal with different json files. When I trigger the dag I pass some parameters via {{dag_run.conf['param1']}} and works fine.
The issue I have is trying to rename the task_id based on param1.
i.e. task_id="df_operator_read_object_json_file_{{dag_run.conf['param1']}}",
it complains about only alphanumeric characters
or
task_id="df_operator_read_object_json_file_{}".format(dag_run.conf['param1']),
it does not recognise dag_run plus the alpha issue.
The whole idea behind this is that when I see at the dataflow jobs console and job has failed I know who the offender is based on param1. Dataflow Job names are based on task_id like this:
df-operator-read-object-json-file-8b9eecec
and what I need is this:
df-operator-read-object-param1-json-file-8b9eecec
Any ideas if this is possible?
There is no need to generate new operator per file.
DataflowTemplatedJobStartOperator has job_name parameter which is also templated so can be used with Jinja.
I didn't test it but this should work:
from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator
op = DataflowTemplatedJobStartOperator(
task_id="df_operator_read_object_json_file",
job_name= "df_operator_read_object_json_file_{{dag_run.conf['param1']}}"
template='gs://dataflow-templates/your_template',
location='europe-west3',
)

If i execute the same test case two times, is there a way to generate report.html with a timestamp in the file name?

I have a question related with report.html that maybe someone could help me to clarify
If i execute the same test case two times, is there a way to generate report.html with a timestamp in the file name so after 2 executions i have 2 reports.html.
for example:
report_20200529_15:00:00.html
report_20200529:15:05:00.html
Thanks in advance for your help
This is covered in the robot framework user guide, in a section titled timestamping output files:
All output files listed in this section can be automatically timestamped with the option --timestampoutputs (-T). When this option is used, a timestamp in the format YYYYMMDD-hhmmss is placed between the extension and the base name of each file.

Testing a file upload to a form in a Behat feature file

I am quite new to writing Behat test suites and I am currently trying to flesh out my existing feature file with an added test to test an uploaded file.
This is what I have come up with so far.
Scenario: Submitting a valid asset form and uploading a file
When I submit a asset form with values:
| name | type | position | active | file |
| St Andrews Release | image | 1 | 1 | /web/images/product/icon/default.jpg |
Then the form should be valid
And the entity form entity should have the following values
| name | type | position | active | file |
| St Andrews Release | image | 1 | 1 | /web/images/product/icon/default.jpg |
Failed asserting that null matches expected '/web/images/product/icon/default.jpg'.
And the entity form entity should be persisted correctly
This is the method handling the scenario:
/**
* #When I submit a asset form with values:
*/
public function iSubmitAssetFormWithValues(TableNode $table)
{
$data = $table->getColumnsHash()[0];
$this->form = $this->submitTheForm('crmpicco.asset.type', $this->entity, $data);
}
The submitTheForm method returns a Symfony\Component\Form\FormInterface.
Am I along the right lines? I am currently getting an error:
Failed asserting that null matches expected
'/web/images/product/swatch/default.jpg'.
I suggest you to create a dedicated folder structure for files which will be used in behat tests right in your application root because tests and the files used in tests must be consistent for all developers. I sometimes see people writing tests to upload files that exist on their local desktop:) My desktop and your desktop would be different hence reason the test would fail.
Structure
football #your application name/root
build
dummy
document
hello.doc
world.xls
image
test.jpg
behat.yml
Apart from other common setting, you must define file_path.
....
....
default:
extensions:
Behat\MinkExtension\Extension:
files_path: %behat.paths.base%/build/dummy/
....
....
Example Gherkin scenario
Feature: I can upload a file which is stored in my generic "dummy" folder
Scenario: I can upload image
Given I am on "/"
When I attach the file "image/test.jpg" to "league_flag"
And I press "Submit"
Then I should see "Succeeded."

How can I aggregate SaltStack command results?

Is it possible to run a SaltStack command that, say, looks to see if a process is running on a machine, and aggregate the results of running that command on multiple minions?
Essentially, I'd like to see all the results that are returned from the minions displayed in something like an ASCII table. Is it possible to have an uber-result formatter that waits for all the results to come back, then applies the format? Perhaps there's another approach?
If you want to do this entirely within Salt, I would recommend creating an "outputter" that displays the data how you want.
A "highstate" outputter was recently created that might give you a good starting point. The highstate outputter creates a small summary table of the returned data. It can be found here:
https://github.com/saltstack/salt/blob/develop/salt/output/highstate.py
I'd recommend perusing the code of the other outputters as well.
If you want to use another tool to create this report, I would recommend adding "--out json" to your command at the cli. This will cause Salt to return the data in json format which you can then pipe to another application for processing.
This was asked a long time ago, but I stumbled across it more than once, and I thought another approach might be useful – use the survey Salt runner:
$ salt-run survey.hash '*' cmd.run 'dpkg -l python-django'
|_
----------
pool:
- machine2
- machine4
- machine5
result:
dpkg-query: no packages found matching python-django
|_
----------
pool:
- machine1
- machine3
result:
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-============-============-=================================
ii python-django 1.4.22-1+deb all High-level Python web development

Resources