Move onto next task in ansible without waiting for other hosts to finish that task - wait

So I have noticed that in my ansible playbooks, if I target multiple hosts with the same playbook that each task must be complete for all hosts before it moves onto the next task.
Is it possible to define the playbook such that a task, upon completion will immediately run the following task without having to wait for the task to finish on all of the other hosts?

This behaviour of Ansible can be controlled with strategy plugins.
By default Ansible uses the linear plugin:
All hosts will run each task before any host starts the next task, using the number of forks (default 5) to parallelize.
Another strategy available is free:
Task execution is in lockstep per host batch as defined by serial (default all). Up to the fork limit of hosts will execute each task at the same time and then the next series of hosts until the batch is done, before going on to the next task.
Set the strategy like this:
- hosts: all
strategy: free
tasks:
If it is not possible to use free strategy you could set batches with the serial directive.

Related

Airflow - How to configure that all DAG's tasks run in 1 worker

I have a DAG with 2 tasks:
download_file_from_ftp >> transform_file
My concern is that tasks can be performed on different workers.The file will be downloaded on the first worker and will be transformed on another worker. An error will occur because the file is missing on the second worker. Is it possible to configure the dag that all tasks are performed on one worker?
It's a bad practice. Even if you will find a work around it will be very unreliable.
In general, if your executor allows this - you can configure tasks to execute on a specific worker type. For example in CeleryExecutor you can set tasks to a specific Queue. Assuming there is only 1 worker consuming from that queue then your tasks will be executed on the same worker BUT the fact that it's 1 worker doesn't mean it will be the same machine. It highly depended on the infrastructure that you use. For example: when you restart your machines do you get the exact same machine or new one is spawned?
I highly advise you - don't go down this road.
To solve your issue either download the file to shared disk space like S3, Google cloud storage, etc... then all workers can read the file as it's stored in cloud or combine the download and transform into a single operator thus both actions are executed together.

Is there way to have 3 set of worker nodes (groups) for airflow

We are setting up airflow for scheduling/orchestration , currently we have Spark python loads, and non-spark loads in different server and push files to gcp available in another server. Is there an option to decide to which worker nodes the airflow task are submitted? Currently we are using ssh connection to run all work loads. Our processing is mostly on-perm
Usage is celery executor model, How to we make sure that a specific task is run on its appropriate node.
task run a non spark server ( no spark binaries available)
task 2 executes PySpark submit - (This has spark binaries)
Task Push the files created from task 2 from another server/nodes ( Only this has the gcp utilities installed to push the files due to security reason ) .
If create a dag, is it possible to mention the task to execute on set of worker nodes ?
Currently we are having wrapper shell script for each task and making 3 ssh runs to complete these process. We would like to avoid such wrapper shell script rather use the inbuild have pythonOperator , SparkSubmitOperator, SparkJdbcOperator and SFTPToGCSOperator and make sure the specific task runs in specific server or worknodes .
In short , can we have 3 worker node groups and make the task to execute on a group of nodes based on the operations?
We can assign a queue to each worker node like
Start the airflow worker with mentioning the queue
airflow worker -q sparkload
airflow worker -q non-sparkload
airflow worker -q gcpload
The start each task with queue mentioned. Similar thread found as well.
How can Airflow be used to run distinct tasks of one workflow in separate machines?

Airflow resource pool usage on DAG-level?

I'm looking at using airflow for scheduling test-cases execution against shared hw in a lab and have some best practice questions on how to use the resource pool concept for a whole DAG-instance instead of just on task level.
Basically a test-case needs (executed as a instance of a test-case DAG (deploy/execute/collect/un-deploy)) certain physical resources and should therefore request them from the different resource pools(modelling the the physical resources) in order to not run into conflicting concurrent usage with other triggered DAG-instances.
My question is if it's possible to define resource usage on DAG-instance level or if it's only possible on task level. If the latter, then would one parallel task claiming the resource during the whole DAG-instance execution be the best way to handle not having to pass the resource claim between all tasks in the DAG? Other alternatives?
Update after questions from Viraj and dlamblin:
Running 1.10.1
Running LocalExecutor
Have verified that I can run parallel DAGS with concurrent tasks
The resources I want to have custom pools for are not worker resources, rather different peripheral hw units such as relays, routers etc that the tasks running in parallel on a the localexecutor should block on if they are occupied(0 custom resource pool instances left) by an/-other task(s)
The Kubernetes Executor allows for certain node type affinity to be configured on the task or dag level. The Celery Executor has a queue concept to select from a worker group with certain resources available to the worker. You're probably not using a Local Executor as your question doesn't quite make sense for that case.

Sharing large intermediate state between Airflow tasks

We have an Airflow deployment with Celery executors.
Many of our DAGs require a local processing step of some file in a BashOperator or PythonOperator.
However, in our understanding the tasks of a given DAG may not always be scheduled on the same machine.
The options for state sharing between tasks I've gathered so far:
Use Local Executors - this may suffice for one team, depending on the load, but may not scale to the wider company
Use XCom - does this have a size limit? Probably unsuitable for large files
Write custom Operators for every combination of tasks that need local processing in between. This approach reduces modularity of tasks and requires replicating existing operators' code.
Use Celery queues to route DAGs to the same worker (docs) - This option seems attractive at first, but what would be an appropriate way to set it up in order to avoid routing everything to one executor, or crafting a million queues?
Use a shared network storage in all machines that run executors - Seems like an additional infrastructure burden, but is a possibility.
What is the recommended way to do sharing of large intermediate state, such as files, between tasks in Airflow?
To clarify something: No matter how you setup airflow, there will only be one executor running.
The executor runs on the same machine as the scheduler.
Currently (current is airflow 1.9.0 at time of writing) there is no safe way to run multiple schedulers, so there will only ever be one executor running.
Local executor executes the task on the same machine as the scheduler.
Celery Executor just puts tasks in a queue to be worked on the celery workers.
However, the question you are asking does apply to Celery workers. If you use Celery Executor you will probably have multiple celery workers.
Using network shared storage solves multiple problems:
Each worker machine sees the same dags because they have the same dags folder
Results of operators can be stored on a shared file system
The scheduler and webserver can also share the dags folder and run on different machines
I would use network storage, and write the output file name to xcom. Then when you need to input the output from a previous task, you would read the file name from that task's Xcom and process that file.
Change datatype of column key in xcom table of airflow metastore.
Default datatype of key is: blob.
Change it to LONGBLOB. It will help you to store upto 4GB in between intermediate tasks.

Can a salt master provide a state to unavailable minions?

Background: I have several servers which run a service I develop. All of them should have the same copy of the service.
To ensure deployment and up-to-dateness I use Ansible, with an idempotent playbook which deploys the service. Since the servers are on an unreliable network, I have to run the playbook periodically (in a cron job) to reach the servers which may not have been available before.
Problem: I was under the impression that the SaltStack philosophy is different: I though I could just "set a state, compile it and offer to a set of minions. These minions would then, at their own leisure, come to the master and get whatever they need to do".
This does not seem to be the case: the minions which were not available at deployment time are skipped.
Question: is there a mechanism which would allow for an asynchronous deployment, in the sense that a state set on the master one time only would then be pulled and applied by the minions (to themselves) once they are ready / can reach the master?
Specifically, without the need to continuously re-offer the same state to all minions, in the hope that the ones which were unavailable in the past are now capable to get the update.
Each time a minion connects to the master there is an event on the event bus which you can react upon.
Reactor
This is the main difference between Ansible and Saltstack.
In order to do what you want, I would react on each minion's reconnect and try to apply a state which is idempotent.
Idempotent
You could also setup a scheduled task in Saltstack that runs the state every X minutes and apply the desired configuration.
Scheduled task
The answer from Daniel Wallace (salt developper):
That is not possible.
The minions connect to the publish port/bus and the master puts new
jobs on that bus. Then the minion picks it up and runs the job, if the
minion is not connected when the job is published, then it will not
see the job.

Resources