I have created an R job and have placed it on a server. I am using airflow to schedule that R job. All my R libraries are placed in a virtual environment created on the server. I have a bash script which has the reference of my R job I want to schedule.
I am facing an issue with activating the virtual environment.
Related
I'm barely new to CI/CD piplines with GIT and hope that my questions is related to that functionality. I want to achieve the following:
Whenever a new commit (or tag) of a specific package in our companies local git server is created, I want to execute a shell script on a remote server. This shell script should download this new version of the package and install it into the global R-Installation, making it available for different processes that use this R environment. So far this is more like a manual step but I really would like to automate the process. Is this possible?
Thanks!
I have an Airflow instance in a local Ubuntu machine. This instance doesn't work very well, so I would like to install it again. The problem is that I can't delete the current instance, because it is used by other people, so I would like to create a new Airflow instance in the same machine to put various dags there.
How could I do it? I created a different virtual environment, but I don't know how to install a second airflow server in that environment, which works in parallel with the current airflow.
Thank you!
use different port for webserver
use different AIRFLOW_HOME variable
use different sql_alchemy_conn (to point to a different database)
copy the deployment you have to start/stop your airflow components.
Depending on your deployment you might somehew record process id of your running airflow (so called pid-files) or have some other way to determine which processes are running. But that is nothing airflow-specific, this is something that is specific for your deployment.
I have an R script that I need to run once every week. I need it to be done using Azure. One option is to use Azure Data Factory and set up a pipline what will run this R script on a VM (Windows).
The problem I'm facing is that I will have to update every now and then both the R script and the R packages the R script is using.
When setting up this pipeline I will have to generalize the VM (correct me if I'm wrong) and doing so I can no longer log into this VM. And if I can't log into this VM I cannot update the R packages.
What options do I have here?
There is an alternate solution where you can use Azure Batch Service and Azure Data Factory to execute R scripts in your pipeline.
For more information, you can refer this blog: How to execute R Scripts using Azure Batch Services and Azure Data Factory? | by Aditya Kaushal | Medium
Alternatively, to run R scripts in your VM, you can use below options:
Custom Script Extension
Run Command
Hybrid Runbook Worker
Serial console
Reference: Run scripts in an Azure Windows VM - Azure Virtual Machines | Microsoft Docs
Currently some of the jobs are running in different Windows VM's.
for eg.,
Task Scheduler to run
Powershell files
.bat files
python files.
Sql Agent jobs
To run SSIS packages
We are planning to use Airflow to trigger all these jobs to have better visibility and manage dependencies.
Our Airflow in Ubuntu.
I would like know if there is any way to trigger above mentioned jobs in Windows via Airflow.
Can I get some examples on how to achieve my objectives? Please suggest what packages/libraries/plugins/operators I can use.
Yes there is. I would start by looking into the winrm operator and hook that you find in under Microsoft in providers:
http://airflow.apache.org/docs/apache-airflow-providers-microsoft-winrm/stable/index.html
and maybe also:
https://github.com/diyan/pywinrm
I am attempting to setup automated tests for our applications using a virtual machine environment.
What I would like to have is something like the following scenario:
Build server is automatically triggered to start an automated test for the application
A "build" script is then run which consist of:
Copy application files and a test script to a location accessible by the VM
Start the VM
In the VM, a special application looks in the shared folder and start the test script
The tests script do its job, results are output to shared folder
Test script ends
The special application then delete the test script
The special application somehow have the VM manager close the VM and revert to the previous snapshot
When the VM has exited, process the result and send to build server.
I am using TeamCity if that matters.
For virtual machines, we use VirtualBox but we are open to any other if needed.
Is there any applications/suite that would manage this scenario?
If there are none then I would then code it myself, should be easy but the only part I am not sure is the handling of the virtual machine.
What I need to be able to do is to have the VM close itself after the test and revert to a previous snapshot since I want it to be in a known state for the next test.
Any pointers?
I have a similar setup running and I chose to use Vagrant as its the same thing our developers where using for normalizing the development environment.
The initial state of the virtualmachine was scripted using puppet, but we didn't run the deployment scripts from scratch on each test, only once a day.
You could use puppet/chef for everything, but for all other operations on the VM, we would use Fabric scripts, as they were used for the real deployment too, and somehow fitted how we worked better. In sum the script would look something like the following:
vagrant up # fire up the vm, and run the puppet provisioning tool
fab vm run_test # run tests on vm
fab local process_result # process results on local shared folder
vagrant destroy # destroy the vm
The advantage is that your developers can also use vagrant to mimic your production environment without having to take care of that themselves (i.e. changes to your database settings get synced to all your developers vm's wherever they are) and the same scripts can be used in production too.
VirtualBox does have a COM API. I have no experience with it, but it may be possible to use that. One option would be to have TeamCity fire off a script to do this. I'd suggest starting with NAnt (supported natively by TeamCity) and possibly executing PowerShell if necessary.
Though I don't have any experience with either, I happen to have heard of a couple applications in this space recently:
http://www.infoq.com/news/2011/05/virtual_machine_test_harness
http://www.automatedqa.com/techpapers/testcomplete/automated-testing-in-virtual-labs/