Using Airflow to Run .bat file or PowerShell program located in remote Windows Box - airflow

Currently some of the jobs are running in different Windows VM's.
for eg.,
Task Scheduler to run
Powershell files
.bat files
python files.
Sql Agent jobs
To run SSIS packages
We are planning to use Airflow to trigger all these jobs to have better visibility and manage dependencies.
Our Airflow in Ubuntu.
I would like know if there is any way to trigger above mentioned jobs in Windows via Airflow.
Can I get some examples on how to achieve my objectives? Please suggest what packages/libraries/plugins/operators I can use.

Yes there is. I would start by looking into the winrm operator and hook that you find in under Microsoft in providers:
http://airflow.apache.org/docs/apache-airflow-providers-microsoft-winrm/stable/index.html
and maybe also:
https://github.com/diyan/pywinrm

Related

How to execute Python Script through Informatica Cloud

I have a python script that I need to execute and automate via IICS. The output of the script is a csv file. This output should be loaded to the Target. How can I achieve this via Informatica cloud. Please help with some info and documentations to the same.
Thanks
There are two ways to do this.
You can create an executable(using py2exe or some tool) from your py script. Then put that file in Informatica cloud agent server. Then you can call it using shell command. Please note, you do not need to install python or any packages.
You can also put the .py file in agent server and run it using shell like $PYTHON_HOME/python your_script.py You need to make sure py version is compatible and you have all packages installed in agent server.
You can refer to the below screenshot for how to setup shell command. Then you can run it as part of some workflow. Schedule it if needed.
https://i.stack.imgur.com/wnDOV.png

Trigger mainframe job from AirFlow

May I know if AirFlow support Mainframe jobs ? Can we schedule Mainframe jobs using AirFlow ?
Thanks in advance.
I do not know airflow specifically, but we have used Ansible, Jenkins, and IBM Urban Code Deploy for orchestration that includes distributed and mainframe process parts.
You can SSH into z/OS and use Bash, Python, cURL, Node.js, or Groovy. You could submit JCL via REST APIs. There is a command line processor for Db2 to execute SQL and stored procedures via bash terminal. There is the new Zowe CLI that brings a modern command line interface to z/OS.
I would ask the question - what is the nature of what you want to be scheduled? What language is it written in, or what language do you want it written in? If something exists today, what is the process and how is it scheduled today?
While I haven't used airflow, you can use modern interfaces to do things on z/OS, and frequently that is what is actually needed to integrate with orchestration tools.
Elaborating on Patrick Bossman's good summary, Apache Airflow definitely supports SSH connections to run commands and/or transfer files:
https://airflow.apache.org/howto/connection/ssh.html
z/OS includes OpenSSH as a standard, IBM supported feature in the base operating system at no additional charge, although it's possible it's not running in your particular z/OS installation. Dovetailed Technologies has published a helpful "Quick Install Guide" that explains how to configure and start OpenSSH on z/OS if it isn't configured already:
http://dovetail.com/docs/pt-quick-inst/pt-quick-inst-doc.pdf
Their reference points to IBM's official z/OS documentation if you need more information.
You may decide to have other connections to z/OS from Apache Airflow, but SSH is certainly an available option.
FYI, it appears possible to run Apache Airflow directly on z/OS 2.4 itself. I haven't personally tried it, but it looks good to go. The recipe to do that would be as follows:
Configure and fire up the z/OS Container Extensions ("zCX"), a standard, included, IBM supported, no additional charge feature in z/OS 2.4 that's compatible with IBM z14 and higher model IBM Z machines.
Install and run a Python container (Docker/OCI format) on zCX, for example a Python container from DockerHub. You'll need a Python container image that includes "s390x" architecture support, either on its own or in a multi-architecture container. (No problem with DockerHub's image.)
Use pip to install Apache Airflow within your Python container, per normal.
Configure your SSH (and perhaps other) connection(s) from Airflow to the rest of z/OS, as described above.
You can also run Apache Airflow on Linux on Z/LinuxONE, either on the same IBM Z machine where z/OS runs or on a different machine. You can test Apache Airflow using the free (for up to 120 days) IBM LinuxONE Community Cloud, and you could even create your own custom Docker/OCI container on the LinuxONE Community Cloud for deployment to zCX.
It might even be possible to run Airflow on Python for z/OS, without zCX, although if so there'd be some more work involved. Python for z/OS is available from Rocket Software here:
https://www.rocketsoftware.com/product-categories/mainframe/python-for-zos

Is it possible to run a unix script using oozie outside hadoop cluster?

We have written a unix batch script and it is hosted on a unix server outside Hadoop Cluster. So is it possible to run that script via oozie?
If it is possible then how can this be achieved?
What is the script doing? If the script just needs to run regulary you can as well use a cronjob or something like that.
Besides this, Oozie has a action for SSH Actions on Remote hosts.
https://oozie.apache.org/docs/3.2.0-incubating/DG_SshActionExtension.html
Maybe you can work something out with that by loging into the remotehost, run the script, wait for completetion and work on from there.

Can oozie control jobs outside of Hadoop?

From documentation, it isn't very clear whether oozie can schedule and control jobs outside of Hadoop? Can someone shed some light on this? If not, is there any open source based workflow engine which can do that?
Try consider using chronos (from airbnb) advanced version of cron with a UI, built on top of mesos. airbnb.github.com/chronos/
Cheers.
I believe no. Because Oozie itself does not have a resource management policy, all it does is submitting jobs to Hadoop's job tracker at the right time. Besides, for each Oozie workflow, there will be one launcher job which is responsible for submitting the real jobs in the workflow to Hadoop. The launcher job is itself a Hadoop job. So, I think for the versions earlier than Oozie 3.2, the answer should be no.
You might consider trying azkaban by linked in. It was specifically built for hadoop. But unix commands can be specified in the job file of azkaban. So you may develop a workflow for any application(s) that can be run using command line.
I've been working on a new workflow engine called Soop. https://github.com/radixCSgeek/soop it is very lightweight and simple to setup and run using a cron-like syntax. It can run any Java POJO as well as running shell processes, so you can kick off a bash script or whatever.

How to use a virtual machine with automated tests?

I am attempting to setup automated tests for our applications using a virtual machine environment.
What I would like to have is something like the following scenario:
Build server is automatically triggered to start an automated test for the application
A "build" script is then run which consist of:
Copy application files and a test script to a location accessible by the VM
Start the VM
In the VM, a special application looks in the shared folder and start the test script
The tests script do its job, results are output to shared folder
Test script ends
The special application then delete the test script
The special application somehow have the VM manager close the VM and revert to the previous snapshot
When the VM has exited, process the result and send to build server.
I am using TeamCity if that matters.
For virtual machines, we use VirtualBox but we are open to any other if needed.
Is there any applications/suite that would manage this scenario?
If there are none then I would then code it myself, should be easy but the only part I am not sure is the handling of the virtual machine.
What I need to be able to do is to have the VM close itself after the test and revert to a previous snapshot since I want it to be in a known state for the next test.
Any pointers?
I have a similar setup running and I chose to use Vagrant as its the same thing our developers where using for normalizing the development environment.
The initial state of the virtualmachine was scripted using puppet, but we didn't run the deployment scripts from scratch on each test, only once a day.
You could use puppet/chef for everything, but for all other operations on the VM, we would use Fabric scripts, as they were used for the real deployment too, and somehow fitted how we worked better. In sum the script would look something like the following:
vagrant up # fire up the vm, and run the puppet provisioning tool
fab vm run_test # run tests on vm
fab local process_result # process results on local shared folder
vagrant destroy # destroy the vm
The advantage is that your developers can also use vagrant to mimic your production environment without having to take care of that themselves (i.e. changes to your database settings get synced to all your developers vm's wherever they are) and the same scripts can be used in production too.
VirtualBox does have a COM API. I have no experience with it, but it may be possible to use that. One option would be to have TeamCity fire off a script to do this. I'd suggest starting with NAnt (supported natively by TeamCity) and possibly executing PowerShell if necessary.
Though I don't have any experience with either, I happen to have heard of a couple applications in this space recently:
http://www.infoq.com/news/2011/05/virtual_machine_test_harness
http://www.automatedqa.com/techpapers/testcomplete/automated-testing-in-virtual-labs/

Resources