I am configuring workflow in oozie to execute a mapreduce task using java action.
The workflow.xml used is as below:
<workflow-app name="accesslogloader" xmlns="uri:oozie:workflow:0.1">
<start to="javamain"/>
<action name="javamain">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${namenode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
</configuration>
<main-class>org.path.AccessLogHandler</main-class>
</java>
<ok to="end"/>
<error to="killjob"/>
</action>
<kill name="killjob">
<message>"Job killed due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
After running the oozie job. the MR job runs and saves data to the hbase. I see the MR job completed as the data is inserted in the hbase.
But after the completion the oozie UI shows as KILLED state.
I am seeing the following error in the syslog:
2014-03-13 00:20:23,425 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2014-03-13 00:20:24,311 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Filesystem closed
2014-03-13 00:20:24,315 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:589)
at java.io.FilterInputStream.close(FilterInputStream.java:181)
at org.apache.hadoop.util.LineReader.close(LineReader.java:149)
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:207)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
What can be the problem?
I do have the same problem. My java action does run a series of complex jobs. Defenetly, it's not good design, but it was the shortest way to reach the goal.
I've tried to pass this prop
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
It doesn't help.
I have a hypothesis that java action runs longer than 10 min (default timeout period for a mpreduce task). So jobtracker kills it. My action runs more than 10 min. I didn't meet such problem when action run was less that 10 min. I've tried to pass property
<property>
<name>mapred.task.timeout</name>
<value>7200000</value>
</property>
but it's not passed.
Here is an action declaration
<action name="long-running-java-action">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.queue.name</name>
<value>default</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>7200000</value>
</property>
<property> <!-- https://issues.apache.org/jira/browse/SQOOP-1226 ???? -->
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
</configuration>
<main-class>my.super.mapreduce.Runner</main-class>
<java-opts>-Xmx4096m</java-opts>
<arg>--config</arg>
<arg>complexConfigGoesHere</arg>
</java>
<ok to="end"/>
<error to="kill"/>
</action>
I suppose that solution should be in increasing task timeout.
Related
I am unable to add/delete any files or directories on HDFS from a shell script which I am executing from Oozie workflow.
The username is "scitest" and the hdfs path I am trying to edit/add/delete is
/user/scitest/.
In the shell script I am trying to delete a folder named test123456 from the path /user/scitest/.
---------------Error from oozie log------------------
429737-oozie-oozi-W#shell-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
2016-12-27 05:04:25,553 INFO ActionEndXCommand:520 - SERVER[vscihadoopvm2.manhdev.com] USER[scitest] GROUP[-] TOKEN[] APP[shell.workflow] JOB[0000041-161208225429737-oozie-oozi-W] ACTION[0000041-161208225429737-oozie-oozi-W#shell-node] ERROR is considered as FAILED for SLA
---------shell-script(sample.sh) content----------
#!/bin/bash
echo "`date` hi" > output.log
hadoop fs -mkdir test123456
-------job.properties---------
nameNode=hdfs://vscihadoopvm1.manhdev.com:8020
jobTracker=vscihadoopvm2.manhdev.com:8050
master=yarn-cluster
#user.name=yarn
queueName=default
examplesRoot=oozietest
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}
---workflow.xml---
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="shell.workflow">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>sample.sh</exec>
<file>sample.sh#sample.sh</file>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Error in Shell.Please refer the Oozie Logs</message>
</kill>
<end name="end"/>
</workflow-app>
#Abhiroy, Maybe its better if you can check with which user your oozie action is getting executed. you can simply place 'id' without quotes in your sample shell script and run the workflow. Then you can trace the oozie job logs to see the container executer for your sample script that oozie is using. Then we can start tracing if you have any permission issues.
I have a oozie work-flow which is performing a distcp operation.
Workflow file is as below :
<workflow-app xmlns="uri:oozie:workflow:0.3" name="distcp-wf">
<start to="distcp-node"/>
<action name="distcp-node">
<distcp xmlns="uri:oozie:distcp-action:0.1">
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<prepare>
<delete path="${namenode}/tmp/mohit/"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queue_name}</value>
</property>
</configuration>
<arg>-m 1</arg>
<arg>${number_of_mapper}</arg>
<arg>-skipcrccheck</arg>
<arg>${namenode}/tmp/mohit/data.txt</arg>
</distcp>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>DistCP failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
I want to set the number of mapper using -m using distcp.
How can i do that I have tried with
<arg>-m 1</arg>
and
<arg>1<arg>
But did not worked for me.
The error that I am getting is as below :
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.DistcpMain], main() threw exception, Returned value from distcp is non-zero (-1)
java.lang.RuntimeException: Returned value from distcp is non-zero (-1)
Args are for input/output as described in the documentation
The first arg indicates the input and the second arg indicates the output
For changing the number of producers/reducers use the configuration for example :
<configuration>
<property>
<name>mapred.reduce.tasks</name>
<value>${firstJobReducers}</value>
</property>
</configuration>
I am trying to write a simple Oozie workflow to execute a simple hive script. The hive script is executing fine without any issue, but getting issue with work flow execution.
The workflow.xml is as below:
<workflow-app xmlns="uri:oozie:workflow:0.4" name="nielsen_dma_xref_intr_dly_load_wf">
<credentials>
<credential name="hive2_cred" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>${hive2_jdbc_uri}</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>${hive2_server_principal}</value>
</property>
</credential>
</credentials>
<start to="nielsen_dma_xref_intr_dly_load_wf_start"/>
<action name="hive_load_nielsen_dma_xref_oozie" cred='hive2_cred'>
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${hiveSiteXML}</job-xml>
<jdbc-url>${jdbc_hive}</jdbc-url>
<password>${password}</password>
<script>${scripts}/nielsen_dma_xref_load.hql</script>
<param>db_dbname_dbname=${db_dbname}</param>
</hive2>
<ok to="hive_load_nielsen_dma_xref_oozie"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="nielsen_dma_xref_intr_dly_load_wf_completed"/>
</workflow-app>
All files are placed under /user/kchandr2/wf folder in HDFS
I am executing the wrokflow using the command oozie job -oozie http://<<hostname>>:11000/oozie -config /home/kchandr2/wf/nielsen_dma_xref_intr_dly_load_wf.properties -run -verbose >> /home/kchandr2/wf/Logs/nielsen_dma_xref_intr_dly_load_wf_$(date '+%Y%m%d').log 2>&1 where the properties file is placed in the local directory /home/kchandr2/wf
I am getting the Error: E0706 : E0706: Node cannot transition to itself node [hive_load_nielsen_dma_xref_oozie]
In your hive2 action
<action name="hive_load_nielsen_dma_xref_oozie" cred='hive2_cred'>
you have an "ok" transition to the very same action node
<ok to="hive_load_nielsen_dma_xref_oozie"/>
This is not allowed. That's why you get the error:
Error: E0706 : E0706: Node cannot transition to itself node [hive_load_nielsen_dma_xref_oozie]
Oozie Workflows are DAGs (Direct Acyclic Graph). You cannot have loops.
I have the following workflow XML and Coordinator XML, both created through Hue Oozie Editor.
<workflow-app name="demo8" xmlns="uri:oozie:workflow:0.4">
<start to="cds4"/>
<action name="cds4">
<fs>
<mkdir path='${nameNode}/my/path/towritefile/${wf:conf(DATE)}'/>
</fs>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
The Coordinator conf is below
<coordinator-app name="Demo4CoordinatorNew"
frequency="${coord:minutes(5)}"
start="2015-01-18T18:15Z" end="2015-01-19T10:46Z" timezone="US/Pacific"
xmlns="uri:oozie:coordinator:0.2">
<controls>
<concurrency>1</concurrency>
<execution>FIFO</execution>
</controls>
<action>
<workflow>
<app-path>${wf_application_path}</app-path>
<configuration>
<property>
<name>DATE</name>
<value>${coord:formatTime(coord:nominalTime(), 'yyyy-MM-dd')}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
I have executed the Coordinator. The value that is passed for DATE parameter is blank. Do you see any issue?
In short, Im trying to create a folder in HDFS based on the Time at which the workflow is triggered.
I also tried
<mkdir path='${nameNode}/my/path/towritefile/${wf:conf("DATE")}'/>
When I do this, it gives an error.
In the workflow, replace ${wf:conf(DATE)} by ${DATE} that way it will be parameterized correctly.
Iam trying to execute under an Hortonworks distribution the map-reduce oozie example...
but it's functional yet...
First, here my Custom Hadoop Configs (from Ambari). I had to modify the core xml to correct my "impersonate" problem...
hadoop.proxyuser.oozie.groups=*
hadoop.proxyuser.oozie.hosts=*
Work well, but now i have this :
Error: E0803 : E0803: IO error, <openjpa-2.1.0-r422266:1071316 fatal store error>
org.apache.openjpa.persistence.RollbackException: The transaction has been rolled back.
See the nested exceptions for details on the errors that occurred. FailedObject: org.apache.oozie.WorkflowJobBean#3f623c47
I already found persons with the same error but no solution... May be can you help me!
My job.properties (on local)
nameNode=hdfs://namenode01:8020
jobTracker=namenode01:8021
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/oozie/${examplesRoot}/apps/map-reduce
outputDir=map-reduce
and my workflow.xml (on HDFS)
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/oozie/examples/output-data/${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.oozie.example.SampleMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.oozie.example.SampleReducer</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/oozie/examples/input-data/text</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/oozie/examples/output-data/${outputDir}</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Iam trying to execute my wf with :
oozie job -oozie http://edgenode01:11000/oozie -config /home/oozie/examples/apps/no-op/job.properties -run
Thanks a lot!
Open Job History at your-hadoop-host:50030/jobhistory.jsp and find your job. There go to map tasks and see logs.
If you are using derby db - check the db location to see if there are any lock files which owned by user other than intended - if yes remove it and stop and start oozie again.