I have an R script that I need to run once every week. I need it to be done using Azure. One option is to use Azure Data Factory and set up a pipline what will run this R script on a VM (Windows).
The problem I'm facing is that I will have to update every now and then both the R script and the R packages the R script is using.
When setting up this pipeline I will have to generalize the VM (correct me if I'm wrong) and doing so I can no longer log into this VM. And if I can't log into this VM I cannot update the R packages.
What options do I have here?
There is an alternate solution where you can use Azure Batch Service and Azure Data Factory to execute R scripts in your pipeline.
For more information, you can refer this blog: How to execute R Scripts using Azure Batch Services and Azure Data Factory? | by Aditya Kaushal | Medium
Alternatively, to run R scripts in your VM, you can use below options:
Custom Script Extension
Run Command
Hybrid Runbook Worker
Serial console
Reference: Run scripts in an Azure Windows VM - Azure Virtual Machines | Microsoft Docs
Related
I am looking for all the possible options to deploy an application written using R and the Shiny package on AWS. I see one of the options is to use EC2, but I would like to know if there is any serverless option available to deploy R applications on AWS.
Does Sagemaker RStudio have an option to deploy? I looked into AWS documents, but unfortunately I could not see an option.
Any leads would be much appreciated.
Using RStudio on SageMaker (or simply R kernels in SageMaker Studio / Notebooks) enables several deployment strategies for RESTful application endpoints.
In this example, you can see a trained R model deployed to a SageMaker serverless endpoint:
https://github.com/aws-samples/rstudio-on-sagemaker-workshop/blob/main/03_SageMakerPipelinesAndDeploy/01_Pipeline.Rmd#L124-L149
In this example, you will deploy a trained R model to a 24/7 RESTful endpoint using SageMaker real-time endpoints:
https://github.com/aws/amazon-sagemaker-examples/tree/main/r_examples/r_serving_with_plumber
In order to build the serving containers for those models, you can check the above examples or you may want to use recently launched functionality using the native R vetiver package:
https://github.com/juliasilge/vetiverdemo/blob/main/biv-svm/train_sagemaker.Rmd
If you're looking to deploy shiny applications and not models, you can use EC2 or a container service like ECS/Fargate to host the Shiney or RStudio connect server. These blogs may be a bit overkill for what you're looking to do, but they should have good examples of achieving hosted applications:
https://aws.amazon.com/blogs/machine-learning/host-rstudio-connect-and-package-manager-for-ml-development-in-rstudio-on-amazon-sagemaker/
https://aws.amazon.com/blogs/architecture/scaling-rstudio-shiny-using-serverless-architecture-and-aws-fargate/
Currently some of the jobs are running in different Windows VM's.
for eg.,
Task Scheduler to run
Powershell files
.bat files
python files.
Sql Agent jobs
To run SSIS packages
We are planning to use Airflow to trigger all these jobs to have better visibility and manage dependencies.
Our Airflow in Ubuntu.
I would like know if there is any way to trigger above mentioned jobs in Windows via Airflow.
Can I get some examples on how to achieve my objectives? Please suggest what packages/libraries/plugins/operators I can use.
Yes there is. I would start by looking into the winrm operator and hook that you find in under Microsoft in providers:
http://airflow.apache.org/docs/apache-airflow-providers-microsoft-winrm/stable/index.html
and maybe also:
https://github.com/diyan/pywinrm
I was doing a simulation study in RStudio using doAzureParallel package. It uses a Virtual Machines to do the work and merges the result finally in my RStudio workspace. However, suddenly my RStudio Stopped working. But when I log in to Microsoft Azure Portal I can still see that the jobs are running and it shows the number of simulations running, completed, etc. Once I restart my Rstudio is there any possibility to get the R output from the Azure portal? I am new to Azure. I hope I would be helped by some experienced people.
Note: You can only monitor the jobs using Azure Portal.
Once the Job Submission object is passed through the Submit call in the SDK and the workload is submitted to the service that has been created using the portal. Now the job can be monitored via the portal or via the client by getting the details of the job using the Job Id that was returned at submission.
This article covers the following Azure services that support the R language:
The doAzureParallel package is a parallel backend for the widely popular foreach package. With doAzureParallel, each iteration of the foreach loop runs in parallel on an Azure Virtual Machine (VM), allowing users to scale up their R jobs to tens or hundreds of machines.
This tutorial shows you how to deploy a Batch pool and run a parallel R job in Azure Batch directly within RStudio.
I have an R script that I run every day that scrapes data from a couple of different websites, and then writes the data scraped to a couple of different CSV files. Each day, at a specific time (that changes daily) I open RStudio, open the file, and run the script. I check that it runs correctly each time, and then I save the output to a CSV file. It is often a pain to have to do this everyday (takes ~10-15 minutes a day). I would love it if someway I could have this script run automatically at a pre-defined specific time, and a buddy of mine said AWS is capable of doing this?
Is this true? If so, what is the specific feature / aspect of AWS that is able to do this, this way I can look more into it?
Thanks!
Two options come to mind thinking about this:
Host a EC2 Instance with R on it and configure a CRON-Job to execute your R-Script regularly.
One easy way to get started: Use this AMI.
To execute the script R offers a CLI rscript. See e.g. here on how to set this up
Go Serverless: AWS Lambda is a hosted microservice. Currently R is not natively supported but on the official AWS Blog here they offer a step by step guid on how to run R. Basically you execute R from Python using the rpy2-Package.
Once you have this setup schedule the function via CloudWatch Events (~hosted cron-job). Here you can find a step by step guide on how to do that.
One more thing: You say that your function outputs CSV files: To save them properly you will need to put them to a file-storage like AWS-S3. You can do this i R via the aws.s3-package. Another option would be to use the AWS SDK for python which is preinstalled in the lambda-function. You could e.g. write a csv file to the /tmp/-dir and after the R script is done move the file to S3 via boto3's S3 upload_file function.
IMHO the first option is easier to setup but the second-one is more robust.
It's a bit counterintuitive but you'd use Cloudwatch with an event rule to run periodically. It can run a Lambda or send a message to an SNS topic or SQS queue. The challenge you'll have is that a Lambda doesn't support R so you'd either have to have a Lambda kick off something else or have something waiting on the SNS topic or SQS queue to run the script for you. It isn't a perfect solution as there are, potentially, quite a few moving parts.
#stdunbar is right about using CloudWatch Events to trigger a lambda function. You can set a frequency of the trigger or use a Cron. But as he mentioned, Lambda does not natively support R.
This may help you to use R with Lambda: R Statistics ready to run in AWS Lambda and x86_64 Linux VMs
If you are running windows, one of the easier solution is to write a .BAT script to run your R-script and then use Window's task scheduler to run as desired.
To call your R-script from your batch file use the following syntax:
C:\Program Files\R\R-3.2.4\bin\Rscript.exe" C:\rscripts\hello.R
Just verify the path to the "RScript" application and your R code is correct.
Dockerize your script (write a Dockerfile, build an image)
Push the image to AWS ECR
Create an AWS ECS cluster and AWS ECS task definition within the cluster that will run the image from AWS ECR every time it's spun-up
Use EventBridge to create a time-based trigger that will run the AWS ECS task definition
I recently gave a seminar walking through this at the Why R? 2022 conference.
You can check out the video here: https://www.youtube.com/watch?v=dgkm0QkWXag
And the GitHub repo here: https://github.com/mrismailt/why-r-2022-serverless-r-in-the-cloud
The user data section in HOT is to run shell command on new created nova instance, however, I want to run a local(the machine which runs heat stack-create command) shell command when perform "heat stack-create" command.
But after I read though the OpenStack HOT doc, I didn't find any syntax to support this.
There is no syntax to support this. Heat templates are processed on the server. There is also no guarantee that the submitting system even has a standard Unix shell (heat templates can be submitted via a web browser, or with tools other than the heat CLI).
If your deployment workflow requires coordinating some local action with the Heat deployment, your best bet is to use another tool (Ansible? A shell script?) to automate the workflow.
An alternative would be to have Heat (or your VMs) signal your local system using some sort of web hook, and have a web service running locally that would receive the signal and perform whatever action is appropriate.