Is it possible to run a unix script using oozie outside hadoop cluster? - oozie

We have written a unix batch script and it is hosted on a unix server outside Hadoop Cluster. So is it possible to run that script via oozie?
If it is possible then how can this be achieved?

What is the script doing? If the script just needs to run regulary you can as well use a cronjob or something like that.
Besides this, Oozie has a action for SSH Actions on Remote hosts.
https://oozie.apache.org/docs/3.2.0-incubating/DG_SshActionExtension.html
Maybe you can work something out with that by loging into the remotehost, run the script, wait for completetion and work on from there.

Related

Using Airflow to Run .bat file or PowerShell program located in remote Windows Box

Currently some of the jobs are running in different Windows VM's.
for eg.,
Task Scheduler to run
Powershell files
.bat files
python files.
Sql Agent jobs
To run SSIS packages
We are planning to use Airflow to trigger all these jobs to have better visibility and manage dependencies.
Our Airflow in Ubuntu.
I would like know if there is any way to trigger above mentioned jobs in Windows via Airflow.
Can I get some examples on how to achieve my objectives? Please suggest what packages/libraries/plugins/operators I can use.
Yes there is. I would start by looking into the winrm operator and hook that you find in under Microsoft in providers:
http://airflow.apache.org/docs/apache-airflow-providers-microsoft-winrm/stable/index.html
and maybe also:
https://github.com/diyan/pywinrm

How to configure oozie shell action to run on all nodes

I have to make oozie shell action to run on all nodes for e.g creating parent directory for logs on local directory.
Thanks in advance!
It is not possible, as far as I know.
But you can try the below approaches initially proposed here:
MapReduce action can run on all nodes, but requires Java application. link
hadoop Streaming + MapReduce shell scripts. link; You can launch it as ssh or shell action in oozie

Is there any best way to write some scripts automated in putty?

I have some daily performed unix scripts like (cd,ls, etc...) to run on a remote server from putty.Basically I need to automate my daily tasks.
Can anyone suggest me which is the best way to write the scripts.
You can use Plink it is in the putty collection (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html)
For more details see this thread putty from a batch file and a script?

How to easily execute R commands on remote server?

I use Excel + R on Windows on a rather slow desktop. I have a full admin access to very fast Ubuntu-based server. I am wondering: how to remotely execute commands on the server?
What I can do is to save the needed variables with saveRDS, and load them on server with loadRDS, execute the commands on server, and then save the results and load them on Windows.
But it is all very interactive and manual, and can hardly be done on regular basis.
Is there any way to do the stuff directly from R, like
Connect with the server via e.g. ssh,
Transfer the needed objects (which can be specified manually)
Execute given code on the server and wait for the result
Get the result.
I could run the whole R remotely, but then it would spawn a network-related problems. Most R commands I do from within Excel are very fast and data-hungry. I just need to remotely execute some specific commands, not all of them.
Here is my setup.
Copy your code and data over using scp. (I used github, so I clone my code from github. This has the benefit of making sure that my work is reproducible)
(optional) Use sshfs to mount the remote folder on your local machine. This allows you to edit the remote files using your local text editor instead of ssh command line.
Put all things you want to run in an R script (on the remote server), then run it via ssh in R batch mode.
There are a few options, the simplest is to exchange secure keys to avoid entering SSH/SCP passwords manually all the time. After this, you can write a simple R script that will:
Save necessary variables into a data file,
Use scp to upload the data file to ubuntu server
Use ssh to run remote script that will process the data (which you have just uploaded) and store the result in another data file
Again, use scp command to transfer the results back to your workstation.
You can use R's system command to run scp and ssh with necessary options.
Another option is to set up cluster worker at the remote machine, then you can export the data using clusterExport and evaluate expressions using clusterEvalQ and clusterApply.
There are a few more options:
1) You can do the stuff directly from R by using Rserve. See: https://rforge.net/
Keep in mind that Rserve can accept connections from R clients, see for example how to connect to Rserve with an R client.
2) You can set up cluster on your linux machine and then use these cluster facilities from your windows client. The simplest is to use Snow, https://cran.r-project.org/package=snow, also see foreach and many other cluster libraries.

Drupal Scheduler module cron

I'm using the Scheduler module to publish and unpublish my content at certain times, however I would like to get a more frequent publish than using the Drupal cron itself.
There is an option within the scheduler for using a lightweight cron specifically for scheduler but I have never written a cron task before and I just simply do not know what I am doing, it gives me an example of how I would write one which is
/usr/bin/wget -O - -q "http://example.com/scheduler/cron"
To make sure I am getting this correctly, would this line (modified to point to my address) go into a file called cron.php?
I have tried doing the above but it doesnt appear to be publishing my content
No, you'll need to add this line to your crontab on the server. Talk to your hosting provider, they should be able to help you.
If you're running your own server, run this command from in the shell:
crontab -e
And add your line last in that file.
See here for more info.

Resources