Is there anyway to run python script asynchronously in Airflow? - airflow

I want to run python script in airflow. To achieve the same I am triggering script using airflow bash operator like below.
sub_dag = BashOperator(
task_id='execute_dependent_dag',
bash_command='python myscript.py',
dag=dag,
trigger_rule="all_success")
However I want it to be triggered asynchronously. Currently it is waiting for script to get finish. I used & as well as nohup to make it run but it didn't work.
Let me know if there is any other way to run it asynchronously.
Thanks in advance.

I believe extending BashOperator to remove wait() call would make that happen with the downside that errors would go silently undetected
Alternatively if the python script / code in question can be imported into your Airflow project, you could try doing the same with PythonOperator through multiprocessing (or a variety of other ways)
Also if you want to get your hands dirty, have a look at this

Related

Airflow UI pause toggle not showing for tutorial

I am following the tutorial in the Airflow docs. When I visit the UI I don't see the toggle to turn on and off (or pause?) the DAGs
I tried to click the trigger DAG button on the right but I guess this just manually runs it once ignoring the scheduler. (A side question, it just says it's running now, it isn't finishing... is it waiting for something?)
So, did I have to do something in order to schedule the DAG first and is that why I'm not seeing a pause button, because it isn't scheduled? that would surprise me because surely I should be able to schedule it from the UI?
Lastly, what are all those other example DAGs and how can I hide them?
Seems to me that some part of your Airflow setup is broken.
Either the scheduler is not working or the files are not deployed.
My suggestion is to check this question as well: Airflow 1.9.0 is queuing but not launching tasks

Running R scripts in Airflow?

Is it possible to run an R script as an airflow dag? I have tried looking online for documentation on this and am unable to do so. Thanks
There doesn't seem to be a R Operator right now.
You could either write your own and contribute to the community or simply run your task as a BashOperator calling RScript.
Another option is to containerize your R script and run it using the DockerOperator, which is included in the standard distribution. This removes the need to have your worker nodes configured with the correct version of R and any needed R libraries.
USe BashOperator for executing R scripts.
For example:
opr_hello = BashOperator(task_id='xyz',bash_command='Rscript Pathtofile/file.r')
There is a pull request open for an R operator, still waiting for it to be incorporated.
https://github.com/apache/incubator-airflow/pull/3115/files

Only works with the CeleryExecutor

I am new at airflow and when i click run 'ignore all dependence' on Task Instance Context Menu like this:
Task Instance Context Menu
It leads to 'Only works with the CeleryExecutor'
I try to Refresh the Web UI but it doesn't work.
(I use LocalExecutor and don't want to use CeleryExecutor)
Why it happened and how can i run a single task ignore all dependence on the Web UI when i use LocalExecutor
I had a similar problem. Issue was following:
With LocalExecutor you cannot run single task, you could only run the whole DAG at once. Source code
DAG was already in 'success' state.
Possible solution is to change DAG status to running.
I worked around this issue by selecting the first task in my DAG and mark all downstream tasks as success.
I would then clear the task I would actually want to run and the scheduler would pick it up and run this task for me.

Can a rscript be aware of other Rscripts in ubuntu

I have a cron job that launches an Rscript to go fetch some data and stash it away in a database. Sometimes the source isn't ready so it waits. Sometimes the source skips a data point so the script ends up waiting until another instance is started from cron. Now that two instances are running they cause problems with each other. Is there something like the following pseudo code that I could put at the top of my scripts so they stop when they see another instance of themselves is already running:
stopifnot(Sys.running('nameofscript.r'))
One thing I thought to do would be for the script to make a temp file with a fixed name at the start and then delete that temp file at the end. This way the script can check for the existence of the file to know if it's already running. I also see there's something called flock which is probably a better solution than that but I'd prefer if there was a way R can do it without bash commands (I've never really learned any bash). If there isn't a pure R way to do this, can the R script call flock on itself or would it only work if the cron task calls a bash script that then calls the rscript. If it isn't already obvious, I don't really know how to use flock.

R - Run source() in background

I want to execute a R script in the background from the R console.
From the console , i usually run R script as source('~/.active-rstudio-document')
I have to wait until the script is completed to go ahead with my rest of work.
Instead of this i want R to be running in the background while i can continue with my work in the console.
Also i should be somehow notified when R completes the source command.
Is this possible in R ?
This might be quite useful as we often sees jobs taking long time.
PS - i want the source script to be running in the same memory space rather than a new one. Hence solutions like fork , system etc wont work for me. I am seeing if i can run the R script as a separate thread and not a separate process.
You can use system() and Rscript to run your script as an asynchronous background process:
system("Rscript -e 'source(\"your-script.R\")'", wait=FALSE)
At the end of your script, you may save your objects with save.image() in order to load them later, and notify of its completion with cat():
...
save.image("script-output.RData")
cat("Script completed\n\n")
Hope this helps!

Resources