can we create dynamic directory structure in oozie coordinator? - oozie

I want to create an event driven oozie coordinator. but the directory path changes regularly. I don't want to hard code the directory in the code.
<datasets>
<dataset name="test_co" frequency="${coord:minutes(120)}" initial-instance="${coordStartDate}" timezone="${timezone}">
<uri-template>**${nameNode}/dynamicName**</uri-template>
<done-flag>_OK</done-flag>
</dataset>
</datasets>
How can i run shell script before this action is triggered it creates the folder name and check if OK file is present inside that folder or not?

Oozie supports creating dynamic directory structure i.e. dated directories using coordinator datasets (if possible use with ).
e.g.
<datasets>
<dataset name="logs" frequency="${coord:hours(1)}" initial-instance="2009- 01-01T01:00Z" timezone="UTC">
<uri-template>hdfs://bar:9000/app/logs/${YEAR}${MONTH}/${DAY}/${HOUR}</uri-template>
</dataset>
</datasets>
After running above oozie code today viz.22-03-2017 16:00 PM
The directory structure would be like : hdfs://bar:9000/app/logs/2017/03/22/16

Related

Oozie shell action: exec and file tags

I'm a newbie in Oozie and I've read some Oozie shell action examples but this got me confused about certain things.
There are examples I've seen where there is no <file> tag.
Some example, like in Cloudera here, repeats the shell script in file tag:
<shell xmlns="uri:oozie:shell-action:0.2">
<exec>check-hour.sh</exec>
<argument>${earthquakeMinThreshold}</argument>
<file>check-hour.sh</file>
</shell>
While in Oozie's website, writes the shell script (the reference ${EXEC} from job.properties, which points to script.sh file) twice, separated by #.
<shell xmlns="uri:oozie:shell-action:0.1">
...
<exec>${EXEC}</exec>
<argument>A</argument>
<argument>B</argument>
<file>${EXEC}#${EXEC}</file>
</shell>
There are also examples I've seen where the path (HDFS or local?) is prepended before the script.sh#script.sh within the <file> tag.
<shell xmlns="uri:oozie:shell-action:0.1">
...
<exec>script.sh</exec>
<argument>A</argument>
<argument>B</argument>
<file>/path/script.sh#script.sh</file>
</shell>
As I understand, any shell script file can be included in the workflow HDFS path (same path where workflow.xml resides).
Can someone explain the differences in these examples and how <exec>, <file>, script.sh#script.sh, and the /path/script.sh#script.sh are used?
<file>hdfs:///apps/duh/mystuff/check-hour.sh</file> means "download that HDFS file into the Current Working Dir of the YARN container that runs the Oozie Launcher for the Shell action, using the same file name by default, so that I can reference it as ./check-hour.sh or simply check-hour.sh in the <exec> element".
<file>check-hour.sh</file> means "download that HDFS file -- from my user's home dir e.g. hdfs:///user/borat/check-hour.sh -- into etc. etc.".
<file>hdfs:///apps/duh/mystuff/check-hour.sh#youpi</file> means "download that HDFS file etc. etc., renaming it as youpi, so that I can reference it as ./youpi or simply youpi in the element".
Note that the Hue UI often inserts unnecessary # stuff with no actual name change. That's why you will see it so often.

Pass property file in WinRun4J

I am trying to translate a .bat file to a INI file so that I can use WinRun4J to launch a small JAVA app as a service.
Working from the demo that ships with the download, the web page https://github.com/poidasmith/winrun4j and a few samples that have posted I've come up with an .ini file that reads as...
terrainserver.class=ru.ibs.JEPPEG3.ProjectionServer.ProjectionServerDaemon
terrainserver.id=TerrainServer
terrainserver.name=WinRun4J TerrainServer terrainserver
terrainserver.description=Pegasus Terrain Service
classpath.1=*.jar
classpath.2=*.zip
arg.1=prjsrvConfig=.\prjsrv.properties
vmarg.1=-Xdebug
vmarg.2=-Xnoagent
vmarg.3=-Xrunjdwp:transport=dt_socket,address=2121,server=y,suspend=n
vm.heapsize.min.percent=256m
vm.heapsize.preferred=1000m
vm.location=C:\Program Files (x86)\Java\jdk1.7.0_55\jre\bin\server\jvm.dl
from the original batch file...
set JAVA_HOME=c:\jdk1.3.1_03
set PRJSRV_CLASSPATH=.\ProjServer.jar;.\ode.jar;.\classes12.zip;.\JAGR-client.jar;.\PegasusElevAdapter.jar
set PRJSRV_PARAM1=prjsrvConfig=.\prjsrv.properties
start %JAVA_HOME%\bin\java.exe -classpath %PRJSRV_CLASSPATH% -D%PRJSRV_PARAM1% -Xms256m -Xmx1000m ru.ibs.JEPPEG3.ProjectionServer.ProjectionServerDaemon
My question is is using arg key the correct method of setting a reference to the prjsrv.properties file? Or is there a better method? JAVA isn't my strongest language so please bear with me.
From what I can see your batch will have to be translated into:
vmarg.4=-DprjsrvConfig=.\prjsrv.properties
Besides that I think you need to rename these:
terrainserver.class=ru.ibs.JEPPEG3.ProjectionServer.ProjectionServerDaemon
terrainserver.id=TerrainServer
terrainserver.name=WinRun4J TerrainServer terrainserver
terrainserver.description=Pegasus Terrain Service
to
service.class=ru.ibs.JEPPEG3.ProjectionServer.ProjectionServerDaemon
service.id=TerrainServer
service.name=WinRun4J TerrainServer terrainserver
service.description=Pegasus Terrain Service
because WinRun4j does not support terrainserver but service.* or main.class instead.

Workflow: Email contents of file

I have a bunch of pig scripts that I'm running as a workflow in oozie. Some of the output files are very short and there are a couple I'd like to concatenate and include in the body of an email action. How would I go about doing this?
Use action and send an email from a script.
workflow.xml :
...
<shell>
<exec>email_hdfs_file.sh</exec>
<file>scripts/email_hdfs_file.sh</exec>
</shell>
...
Make sure you have scripts/email_hdfs_file.sh in the same folder in hdfs.
email_hdfs_file.sh :
#1 download and merge multiple files into one
hadoop fs -getmerge /path/to/your/files part-all.txt
#2 put a command that emails part-all.txt file here
It's up to you how to implement #2

How to schedule a sqoop action using oozie

I am new to Oozie, Just wondering - How do I schedule a sqoop job using Oozie. I know sqoop action can be added as part of the Oozie workflow. But how can I schedule a sqoop action and get it running like every 2 mins or 8pm every day automatically (just lie a cron job)?
You need to create coordinator.xml file with start, end and frequency. Here is an example
<coordinator-app name="example-coord" xmlns="uri:oozie:coordinator:0.2"
frequency="${coord:days(7)}"
start="${start}"
end= "${end}"
timezone="America/New_York">
<controls>
<timeout>5</timeout>
</controls>
<action>
<workflow>
<app-path>${wf_application_path}</app-path>
</workflow>
</action>
</coordinator-app>
Then create coordinator.properties file like this one:
host=namenode01
nameNode=hdfs://${host}:8020
wf_application_path=${nameNode}/oozie/deployments/example
oozie.coord.application.path=${wf_application_path}
start=2013-07-13T07:00Z
end=2013-09-31T23:59Z
Upload your coordinator.xml file to hdfs and then submit your coordinator job with something like
oozie job -config coordinator.properties -run
Check the documentation http://oozie.apache.org/docs/3.3.2/CoordinatorFunctionalSpec.html it contains some examples.
I think the following blog will be quite useful..
http://www.tanzirmusabbir.com/2013/05/chunk-data-import-incremental-import-in.html

trying to use log4j.xml file within WinRun4j

has anyone tried to use a log4j.xml reference within a WinRun4j service configuration. here is a copy of my service.ini file. I have tried many configuration combinations. this is just my latest attempt
service.class=org.boris.winrun4j.MainService
service.id=SimpleBacnetIpDataTransfer
service.name=Simple Backnet IP DataTransfer Service
service.description=This is the service for the Simple Backnet IP DataTransfer.
service.startup=auto
classpath.1=C:\Inbox\DataTransferClient-1.0-SNAPSHOT-jar-with-dependencies.jar
classpath.2=WinRun4J.jar
classpath.3=C:\Inbox\log4j-1.2.16.jar
arg.1=C:\Inbox\DataTransferClient.xml
log=C:\WinRun4J-Service\SimpleBacnetIpDataTransfer\NBP-DT-service.log
log.overwrite=true
log.roll.size=10MB
[MainService]
class=com.shiftenergy.ws.App
vmarg.1=-Xdebug
vmarg.2=-Xnoagent
vmarg.3=-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n
vmarg.4=-Dlog4j.configuration=file:C:\Inbox\log4j.xml
within the log4j.xml file, there is reference to a log file for when the application runs. if I run the java -jar -Dlog4j.configuration=file:C:\Inbox\log4j.xml ...., the log file is created accordingly. if I register my service and start the service, the log file does not get created.
has anyone had success using the -D log4j configuration, using winrun4j?
thanks
I think that you provided the vmarg.4 parameter incorrectly. In your case it has to be like:
vmarg.4=-Dlog4j.configurationFile=[Path for log4j.xml]
I am also using the same and in my case, it is working perfectly fine. Please see below example:
vmarg.1=-Dlog4j.configurationFile=.\log4j2.xml
Have you tried setting the path in your code instead:
System.setProperty("log4j.configurationFile", "config/log4j.xml");
I'm using a relative path to a folder named config that contains log4j.xml. An absolute path is not recommended, but may work as well.
Just be sure to set this before making any calls to log4j, including any log4j config settings or static method calls!
System.setProperty("log4j.configurationFile", "config/log4j.xml");
final Logger log = Logger.getLogger(Main.class);
log.info("Starting up");
I didn't specify the log4j path in the ini file, only placed log4j.xml file at the same place the jar was placed.
Also without specify the
System.setProperty("log4j.configurationFile", "config/log4j.xml");
In the Java project it was stored in (src/main/resources) and will be included in the jar, but it will not be that one used if placed outside the jar.

Resources