I have an R script that I need to run once every week. I need it to be done using Azure. One option is to use Azure Data Factory and set up a pipline what will run this R script on a VM (Windows).
The problem I'm facing is that I will have to update every now and then both the R script and the R packages the R script is using.
When setting up this pipeline I will have to generalize the VM (correct me if I'm wrong) and doing so I can no longer log into this VM. And if I can't log into this VM I cannot update the R packages.
What options do I have here?
There is an alternate solution where you can use Azure Batch Service and Azure Data Factory to execute R scripts in your pipeline.
For more information, you can refer this blog: How to execute R Scripts using Azure Batch Services and Azure Data Factory? | by Aditya Kaushal | Medium
Alternatively, to run R scripts in your VM, you can use below options:
Custom Script Extension
Run Command
Hybrid Runbook Worker
Serial console
Reference: Run scripts in an Azure Windows VM - Azure Virtual Machines | Microsoft Docs
I am trying to automate the flow of calling Spotify. I am nearly there but I am held up at one point.
I am using Airflow (ETL tool) to run my python scripts.
I am able to connect using the spotipy module but it cannot read/access the .cache-'username' file which it needs to. I am running on a Debian Google Cloud Compute instance.
The error is
Couldn't write token to cache at: .cache-'username'
I'm creating an app where a user can create a piece of data that could be presented in the ui at a later date. I'm trying to find a way to create cron entries dynamically using either java code (for android devices) or node.js code (firebase cloud function generates a cron job). I haven't been able to find a way to do it dynamically and based on what I read it may not be possible. Does anyone out there know a way?
Presently the only way to create GAE cron jobs is via deployment of a cron configuration file (by itself or, in some cases, together with the app code). Which today can only be done via CLI tools (from the GAE or Cloud SDKs).
I'm unsure if you'd consider programmatically invoking such CLI tools qualifying to 'create cron entries dynamically'. If you do - generation the cron config file and scripting the desired CLI-based deployment would be an approach.
But creating jobs via an API is still a feature request, see How to schedule repeated jobs or tasks from user parameters in Google App Engine?. It also contains another potentially acceptable approach (along the same lines as the one mentioned in #ceejayoz's comment))
We are running a larger scale project on Firebase and have already invested in unit tests. Now, we are also using the experimental shell to run integration tests against a testing environment and database. We would very much like to invoke functions via bash and shell script instead of opening the experimental shell, requiring our testdata and invoking each function manually. We tried reverse-engineering the firebase tools for that matter but this seems to be overkill. Any idea how we might be able to test-run all our functions in series?
I have an R script that I run every day that scrapes data from a couple of different websites, and then writes the data scraped to a couple of different CSV files. Each day, at a specific time (that changes daily) I open RStudio, open the file, and run the script. I check that it runs correctly each time, and then I save the output to a CSV file. It is often a pain to have to do this everyday (takes ~10-15 minutes a day). I would love it if someway I could have this script run automatically at a pre-defined specific time, and a buddy of mine said AWS is capable of doing this?
Is this true? If so, what is the specific feature / aspect of AWS that is able to do this, this way I can look more into it?
Thanks!
Two options come to mind thinking about this:
Host a EC2 Instance with R on it and configure a CRON-Job to execute your R-Script regularly.
One easy way to get started: Use this AMI.
To execute the script R offers a CLI rscript. See e.g. here on how to set this up
Go Serverless: AWS Lambda is a hosted microservice. Currently R is not natively supported but on the official AWS Blog here they offer a step by step guid on how to run R. Basically you execute R from Python using the rpy2-Package.
Once you have this setup schedule the function via CloudWatch Events (~hosted cron-job). Here you can find a step by step guide on how to do that.
One more thing: You say that your function outputs CSV files: To save them properly you will need to put them to a file-storage like AWS-S3. You can do this i R via the aws.s3-package. Another option would be to use the AWS SDK for python which is preinstalled in the lambda-function. You could e.g. write a csv file to the /tmp/-dir and after the R script is done move the file to S3 via boto3's S3 upload_file function.
IMHO the first option is easier to setup but the second-one is more robust.
It's a bit counterintuitive but you'd use Cloudwatch with an event rule to run periodically. It can run a Lambda or send a message to an SNS topic or SQS queue. The challenge you'll have is that a Lambda doesn't support R so you'd either have to have a Lambda kick off something else or have something waiting on the SNS topic or SQS queue to run the script for you. It isn't a perfect solution as there are, potentially, quite a few moving parts.
#stdunbar is right about using CloudWatch Events to trigger a lambda function. You can set a frequency of the trigger or use a Cron. But as he mentioned, Lambda does not natively support R.
This may help you to use R with Lambda: R Statistics ready to run in AWS Lambda and x86_64 Linux VMs
If you are running windows, one of the easier solution is to write a .BAT script to run your R-script and then use Window's task scheduler to run as desired.
To call your R-script from your batch file use the following syntax:
C:\Program Files\R\R-3.2.4\bin\Rscript.exe" C:\rscripts\hello.R
Just verify the path to the "RScript" application and your R code is correct.
Dockerize your script (write a Dockerfile, build an image)
Push the image to AWS ECR
Create an AWS ECS cluster and AWS ECS task definition within the cluster that will run the image from AWS ECR every time it's spun-up
Use EventBridge to create a time-based trigger that will run the AWS ECS task definition
I recently gave a seminar walking through this at the Why R? 2022 conference.
You can check out the video here: https://www.youtube.com/watch?v=dgkm0QkWXag
And the GitHub repo here: https://github.com/mrismailt/why-r-2022-serverless-r-in-the-cloud