oozie > coordinator > how to trigger action with external event - oozie

In Oozie site, it was told "Commonly, workflow jobs are run based on regular time intervals and/or data availability. And, in some cases, they can be triggered by an external event."
Anyone has any idea about how to trigger action with external event?

External Event can be availability of file in some directory.
So oozie coordinator has this facility.This is useful when you need to trigger second workflow on completion of first dependent workflow.
Second cordinator keeps on polling for availability of success_trigger.txt in
triggerdirpath
triggerdirpath is hdfs path where success_trigger.txt is created by first workflow
<coordinator-app name="Xxx" frequency="${coord:days(1)}" start="${startTime2}" end="${endTime}" timezone="GMT-0700" xmlns="uri:oozie:coordinator:0.2">
<dataset name="check_for_SUCCESS" frequency="${coord:days(1)}" initial-instance="${startTime2}" timezone="GMT-0700">
<uri-template>${triggerdirpath}</uri-template>
<done-flag>success_trigger.txt</done-flag>
</dataset>
</datasets>
<input-events>
<data-in name="check_for_SUCCESS_data" dataset="check_for_SUCCESS">
<instance>${startTime2}</instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path>${WF_path}</app-path>
<configuration>
<property><name>WaitForThisInputData</name><value>${coord:dataIn('check_for_SUCCESS_data')}</value></property>
<property><name>WhenToStart</name><value>${startTime2}</value></property>
<property><name>rundate</name><value>${coord:dataOut('currentFullDate')}</value></property>
<property><name>previousdate</name><value>${coord:dataOut('previousFullDate')}</value></property>
<property><name>currentyear</name><value>${coord:dataOut('currentYear')}</value></property>
<property><name>currentmonth</name><value>${coord:dataOut('currentMonth')}</value></property>
<property><name>currentday</name><value>${coord:dataOut('currentDay')}</value></property>
<property><name>previousbatchtime</name><value>${coord:formatTime(coord:dateOffset(coord:nominalTime(),-1,'DAY'),"yyyy-MM-dd")}</value></property>
<property><name>currentbatchtime</name><value>${coord:formatTime(coord:dateOffset(coord:nominalTime(),0,'DAY'),"yyyy-MM-dd")}</value></property>
<property><name>nextbatchtime</name><value>${coord:formatTime(coord:dateOffset(coord:nominalTime(),1,'DAY'),"yyyy-MM-dd")}</value></property>
</configuration>
</workflow>
</action>
</coordinator-app>

Related

Corda Enterprise generates a details-node-.log file

at the beginning of the project we worked with Corda Opensource, and we used the command line argument logging-level=WARN to change the log level of the nodes.
When we started using Corda Enterprise, we noticed that a details-node-.log file was created. It is a log file that grows fast and is at TRACE level.
Our question: can the log in this file affect the performance of our cordapps and can we change the level of this log or disable it?
Corda Enterprise adds that logger which is not present in Open Source Corda.
The only impact I can see for a CordApp could be probably the lack of space on the server, so if in your case this log file becomes too big too quickly, it would be a good idea to configure it to avoid possible problems.
You can override the log4j configuration file and give it as input to the jar like this:
java -jar Dlog4j.configurationFile=new-log-config.xml <en-service>.jar
It is standard log4j, so you can also configure the rollover period and the size.
For reference, you can also take a look to this log4j.xml in open source Corda to see how the loggers are configured.
So, can probably override the logger you're concerned about with the following:
<?xml version="1.0" encoding="UTF-8"?>
<Properties>
...
<Property name="detailLogLevel">TRACE</Property>
</Properties>
<Appenders>
...
<RollingRandomAccessFile name="Detailed-RollingFile-Appender"
fileName="${log-path}/details-${log-name}.log"
filePattern="${archive}/details-${log-name}.%date{yyyy-MM-dd}-%i.log.gz">
<Policies>
...your policies...
</Policies>
<DefaultRolloverStrategy>
...your strategy...
</DefaultRolloverStrategy>
</RollingRandomAccessFile>
</Appenders>
<Loggers>
...
<Logger name="DetailedInfo" additivity="false" level="${detailLogLevel}">
<AppenderRef ref="Detailed-RollingFile-Appender"/>
</Logger>
</Loggers>

Can some one let me know why following Oozie coordinator is running in loop

I was new to oozie process . I was testing the following coordinator.xml,when i submit the job it running in loop but I want to run everyday at 1:00 am .Can someone let me know what mistake i was doing.
<coordinator-app name="cron-coord-jon" frequency="0 1 * * *" start="2009-01-01T05:00Z" end="2036-01-01T06:00Z" timezone="UTC"
xmlns="uri:oozie:coordinator:0.2">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
Your coordinator is likely not running in a loop, but rather submitting every 'missed' job since the start date you specified. Set the start date to the current day (e.g. 2019-06-03T00:00Z) and relaunch your coordinator.
If the start time is before 01:00, you should see a single job be launched for the day.
You may want to pass this in as a variable. Here is the call to date that will provide the current date & time in the correct format.
date -u "+%Y-%m-%dT%H:%MZ"

Is it possible to run an Oozie Spark Action without specifying inputDir & outputDir

According to https://oozie.apache.org/docs/3.3.1/WorkflowFunctionalSpec.html#a4.1_Workflow_Job_Properties_or_Parameters we know ..
When submitting a workflow job for the workflow definition above, 3 workflow job properties must be specified:
jobTracker:
inputDir:
outputDir:
I have a PySpark script that has specified input & output locations in the script itself. I don't need and want an inputDir and outputDir in my workflow XML. When running my PySpark script via Oozie, I get this error message.
WARN ParameterVerifier:523 - SERVER[<my_server>] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] The application does not define formal parameters in its XML definition
WARN JobResourceUploader:64 - SERVER[<my_server>] Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-05-24 11:52:29,844 WARN JobResourceUploader:171 - SERVER[<my_server>] No job jar file set. User classes may not be found. See Job or Job#setJar(String).
Based on https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/util/ParameterVerifier.java , my first warning is caused by the fact that I dont have a "inputDir"
else {
// Log a warning when the <parameters> section is missing
XLog.getLog(ParameterVerifier.class).warn("The application does not define formal parameters in its XML "
+ "definition");
}
Can I get around this at all ?
Update -- my XML structure
<action name="spark-node">
<spark xmlns="uri:oozie:spark-action:0.1" >
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
<master>yarn-master</master>
<!-- <mode>client</mode> -->
<name>oozie_test</name>
<jar>oozie_test.py</jar>
<spark-opts>--num-executors 1 --executor-memory 10G --executor-cores 1 --driver-memory 1G</spark-opts>
</spark>
<ok to="end" />
<error to="fail" />
</action>

Is "Event-based push" enabled in Artifactory Pro?

Artifactory Professional 5.9.0 rev 50900900
Regarding to documentation Artifactory Pro supports "Event-based push"
I am create 2 test repos with replication from one to second and "Enable Event Replication" enabled
screenshot
But then I upload artifact I see it in second repo only after 5 min (when cron replication runs). No imediately event replication. And no replication events in log except cron events.
In order to assist you with the issue you are experiencing, please share the repository replication configuration (mask the URL and the user/pass).
In addition to that add the below to your '$artifactory_home/etc/logback.xml' at the end of the file, just above '':
<appender name="repli" class="ch.qos.logback.core.rolling.RollingFileAppender">
<File>${artifactory.home}/logs/replication.log</File>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<FileNamePattern>${artifactory.home}/logs/replication.%i.log</FileNamePattern>
<MinIndex>1</MinIndex>
<MaxIndex>9</MaxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<MaxFileSize>25MB</MaxFileSize>
</triggeringPolicy>
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="org.jfrog.common.logging.logback.layout.BackTracePatternLayout">
<pattern>%date [%thread] [%-5p] \(%-20c{3}:%L\) - %m%n</pattern>
</layout>
</encoder>
</appender>
<logger name="org.artifactory.addon.replication" additivity="false" >
<level value="trace" />
<appender-ref ref="repli" />
This will create a new log file ($artifactory_home/logs/replication.log) that will hold the replication operations.
Once doing so, try to deploy a file to the repository with the event replication.
After you have the log written, please share the data from the log file
Event-based push replication is not enabled immediately after you enter license key. After a week it start to work. May be reboot needed after license key was input.

Can Oozie HDFS action use file patterns or glob?

Can I use wildcards (e.g. *) or file patterns (e.g. {}) in Oozie move actions?
I am trying to move the results of my job into archiving directory.
State of the directory structure:
output
- 201304
- 201305
archive
- 201303
My action:
<fs name="archive-files">
<move source="hdfs://namenode/output/{201304,201305}"
target="hdfs://namenode/archive" />
<ok to="next"/>
<error to="fail"/>
</fs>
resulting error:
FS006: move, source path [hdfs://namenode/output/{201304,201305}] does not exist
Is there an easy way to move more than one file in a glob or bash like syntax? Looking to do something similar to this hadoop command:
hadoop fs -mv hdfs://namenode/output/{201304,201305} hdfs://namenode/archive
Am I missing something? The hadoop fs command accepts glob. Does Oozie?
Oozie HDFS action has quite limited functionality, which is fully described in functional specification. To do something more complicated you can use Shell action. It allows to run arbitrary shell commands as part of workflow, e.g. hadoop fs in your case.
No - from my experience it doesn't look like it works.
FS006: move, source path [hdfs://nodename:8020/projects/blah/201*.gz] does not exist
In workflow.xml use this:
<action name="Movefiles">
<fs>
<move source='${SourcePath}' target='${DestinationPath}'/>
</fs>
<ok to="end"/>
<error to="fail"/>
</action>
and in job.properties write:
SourcePath=output/*/
DestinationPath=archive

Resources