Oozie Custom EL function oozie for coordinator - oozie

I want to create a custom el function that i will use inside oozie coordinator. My custom el function is working fine but when i pass already defined oozie el function as a parameter to my el function it throws exception.
coordinator.xml
${coord:dateToEpoch(coord:nominalTime(), "yyyy-MM-dd'T'hh:mmZ") see below example
<datasets>
<dataset name="input1" frequency="${inputDataSetFrequence}" initial-instance="${initialInstance}"
timezone="${timezone}">
<uri-template>${inputBasePath}/${useCaseName}/bintime=${coord:dateToEpoch(coord:nominalTime(), "yyyy-MM-dd'T'hh:mmZ")}
</uri-template>
<done-flag></done-flag>
</dataset>
</datasets>
<input-events>
<data-in name="coordInput1" dataset="input1">
<instance>${coord:current(0)}</instance>
</data-in>
</input-events>
Configuarion that i used to test this
<property>
<name>oozie.service.ELService.ext.functions.coord-job-submit-nofuncs</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
constant param is working but dynamic param is not working
Working call
${coord:dateToEpoch("2009-01-01T08:00UTC", "yyyy-MM-dd'T'hh:mmZ")
Exception call
${coord:dateToEpoch(coord:nominalTime(), "yyyy-MM-dd'T'hh:mmZ")
Tried to use all these properties
<property>
<name>oozie.service.ELService.ext.functions.job-submit</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.workflow</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.wf-sla-submit</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-job-submit-freq</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-job-submit-data</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-job-submit-instances</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-sla-create</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-sla-submit</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-action-create</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-action-create-inst</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-action-start</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-job-wait-timeout</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.bundle-submit</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
<property>
<name>oozie.service.ELService.ext.functions.coord-job-submit-initial-instance</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
and
<property>
<name>oozie.service.ELService.ext.functions.coord-job-submit-nofuncs</name>
<value>coord:dateToEpoch=com.mobileum.oozie.MobileumELFunctions#dateToEpoch</value>
</property>
EXCEPTION
Caused by: java.lang.Exception: Unable to evaluate :${inputBasePath}/${useCaseName}/bintime=${coord:dateToEpoch(coord:nominalTime(), "yyyy-MM-dd'T'hh:mmZ")}:
at org.apache.oozie.coord.CoordELFunctions.evalAndWrap(CoordELFunctions.java:743)
at org.apache.oozie.command.coord.CoordSubmitXCommand.resolveTagContents(CoordSubmitXCommand.java:1002)
... 37 more
Caused by: javax.servlet.jsp.el.ELException: No function is mapped to the name "coord:nominalTime"
at org.apache.commons.el.Logger.logError(Logger.java:481)
at org.apache.commons.el.Logger.logError(Logger.java:498)
at org.apache.commons.el.Logger.logError(Logger.java:525)
at org.apache.commons.el.FunctionInvocation.evaluate(FunctionInvocation.java:150)
at org.apache.commons.el.FunctionInvocation.evaluate(FunctionInvocation.java:163)
at org.apache.commons.el.ExpressionString.evaluate(ExpressionString.java:114)
at org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:274)
at org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190)
at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204)
at org.apache.oozie.coord.CoordELFunctions.evalAndWrap(CoordELFunctions.java:734)

Not all Oozie EL constructs are evaluated in uri-template. Please refer section Synchronous Datasets for more details. Below is excerpt on uri-template:
uri-template: The URI template that identifies the dataset and can be
resolved into concrete URIs to identify a particular dataset instance.
The URI template is constructed using:
constants: See the allowable EL Time Constants below. Ex: ${YEAR}/${MONTH}.
variables: Variables must be resolved at the time a coordinator job is submitted to the coordinator engine. They are normally provided a job parameters (configuration properties). Ex: ${market}/${language}
The following EL constants can be used within synchronous dataset URI templates:
YEAR
MONTH
DAY
HOUR
MINUTE
Problem is not associated with your custom EL function implementation. So even if you have used following, it will not work:
<uri-template>${inputBasePath}/${useCaseName}/bintime=${coord:nominalTime()}</uri-template>

Related

Container is running beyond virtual memory limits

When I do rhadoop example, below errors are occurred.
is running beyond virtual memory limits. Current usage: 121.2 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
hadoop streaming failed with error code 1
How can I fix it?
My hadoop settings.
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/usr/local/hadoop-2.7.3/data/yarn/nm-local-dir</value>
</property>
<property>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value>/usr/local/hadoop-2.7.3/data/yarn/system/rmstore</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>0.0.0.0:8089</value>
</property>
</configuration>
I got almost same error while running a Spark application on YARN cluster.
"Container [pid=791,containerID=container_1499942756442_0001_02_000001] is running beyond virtual memory limits. Current usage: 135.4 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container."
I resolved it by disabling virtual memory check in the file yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
This one setting was enough in my case.
I referred below site.
http://crazyadmins.com/tag/tuning-yarn-to-get-maximum-performance/
Then I got to know that I can change memory allocation of mapreduce.
I changed mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2000</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2000</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>1600</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>1600</value>
</property>
</configuration>

OOZIE 4.2.0 Start-up Error: Exception = Could not authenticate, Authentication failed, status: 404, message: Not Found

I have configured and built ooze 4.2.0 with the following components
* Hadoop - 2.6.0
* OS - Mac 10.7
* Hive - 1.1.2
* Hbase - 1.2.1
The embedded tomcat version with the ooze distort is 6.0.43. As per ooze install and config instructions the server should run at localhost:11000/oozie , but I am getting the below error when I am starting ooze and checking the status of oozie.
Command (I have removed the http:// as I am not allowed more than 2 links in question)
oozie admin -oozie localhost:11000/oozie -status
Exception = Could not authenticate, Authentication failed, status: 404, message: Not Found
My ooze-site.xml configuration is as below
<property>
<name>oozie.system.id</name>
<value>oozie-iMac</value>
<description>The Oozie system ID.</description>
</property>
<property>
<name>oozie.service.AuthorizationService.security.enabled</name>
<value>false</value>
<description>Specifies whether security is enabled or not.
If disabled any user can manage Oozie system and manage any job.
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.kerberos.enabled</name>
<value>false</value>
<description>Indicates if Oozie is configured to use Kerberos.</description>
</property>
<property>
<name>oozie.authentication.type</name>
<value>simple</value>
<description> Defines Authentication used for Oozie HTTP endpoint.
Supported values: simple|kerberos|#AUTHENTICATION_HANDLER_CLASSNAME#
</description>
</property>
<property>
<name>oozie.services</name>
<value>
org.apache.oozie.service.SchedulerService,
org.apache.oozie.service.InstrumentationService,
org.apache.oozie.service.MemoryLocksService,
org.apache.oozie.service.UUIDService,
org.apache.oozie.service.ELService,
org.apache.oozie.service.AuthorizationService,
org.apache.oozie.service.UserGroupInformationService,
org.apache.oozie.service.HadoopAccessorService,
org.apache.oozie.service.JobsConcurrencyService,
org.apache.oozie.service.URIHandlerService,
org.apache.oozie.service.DagXLogInfoService,
org.apache.oozie.service.SchemaService,
org.apache.oozie.service.LiteWorkflowAppService,
org.apache.oozie.service.JPAService,
org.apache.oozie.service.CallbackService,
org.apache.oozie.service.ActionService,
org.apache.oozie.service.ShareLibService,
org.apache.oozie.service.CallableQueueService,
org.apache.oozie.service.ActionCheckerService,
org.apache.oozie.service.RecoveryService,
org.apache.oozie.service.PurgeService,
org.apache.oozie.service.CoordinatorEngineService,
org.apache.oozie.service.BundleEngineService,
org.apache.oozie.service.DagEngineService,
org.apache.oozie.service.CoordMaterializeTriggerService,
org.apache.oozie.service.StatusTransitService,
org.apache.oozie.service.PauseTransitService,
org.apache.oozie.service.GroupsService,
org.apache.oozie.service.ProxyUserService,
org.apache.oozie.service.XLogStreamingService,
org.apache.oozie.service.JvmPauseMonitorService,
org.apache.oozie.service.SparkConfigurationService
</value>
<description>
All services to be created and managed by Oozie Services singleton.
Class names must be separated by commas.
</description>
</property>
<property>
<name>oozie.db.schema.name</name>
<value>oozie</value>
<description> Oozie DataBase Name </description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
<description> JDBC driver class. </description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://localhost:3307/oozie?createDatabaseIfNotExist=true</value>
<description> JDBC URL. for MySQL DB connection </description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
<description> DB user name. </description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>oozie</value>
<description> DB user password (leave 1 blank space if empty) </description>
</property>
<!-- Added as per Apache OOZIE Install documentation -->
<property>
<name>oozie.service.JPAService.create.db.schema</name>
<value>false</value>
</property>
<property>
<name>oozie.service.JPAService.validate.db.connection</name>
<value>false</value>
</property>
<property>
<name>oozie.service.JPAService.pool.max.active.conn</name>
<value>10</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.keytab.file</name>
<value>http.keytab</value>
<description>Location of the Oozie user keytab file</description>
</property>
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
<description> Proxy User Setup for Oozie runs </description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
<description> Proxy Group configuration for Oozie Run
</description>
</property>
When I open the browser and type the same address it shows the error as below
Tomcat Error on Safari Browser
What am I doing wrong here.

Job submitted in oozie is getting killed

I am configuring workflow in oozie to execute a mapreduce task using java action.
The workflow.xml used is as below:
<workflow-app name="accesslogloader" xmlns="uri:oozie:workflow:0.1">
<start to="javamain"/>
<action name="javamain">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${namenode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
</configuration>
<main-class>org.path.AccessLogHandler</main-class>
</java>
<ok to="end"/>
<error to="killjob"/>
</action>
<kill name="killjob">
<message>"Job killed due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
After running the oozie job. the MR job runs and saves data to the hbase. I see the MR job completed as the data is inserted in the hbase.
But after the completion the oozie UI shows as KILLED state.
I am seeing the following error in the syslog:
2014-03-13 00:20:23,425 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2014-03-13 00:20:24,311 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Filesystem closed
2014-03-13 00:20:24,315 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:589)
at java.io.FilterInputStream.close(FilterInputStream.java:181)
at org.apache.hadoop.util.LineReader.close(LineReader.java:149)
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:207)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
What can be the problem?
I do have the same problem. My java action does run a series of complex jobs. Defenetly, it's not good design, but it was the shortest way to reach the goal.
I've tried to pass this prop
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
It doesn't help.
I have a hypothesis that java action runs longer than 10 min (default timeout period for a mpreduce task). So jobtracker kills it. My action runs more than 10 min. I didn't meet such problem when action run was less that 10 min. I've tried to pass property
<property>
<name>mapred.task.timeout</name>
<value>7200000</value>
</property>
but it's not passed.
Here is an action declaration
<action name="long-running-java-action">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.queue.name</name>
<value>default</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>7200000</value>
</property>
<property> <!-- https://issues.apache.org/jira/browse/SQOOP-1226 ???? -->
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
</configuration>
<main-class>my.super.mapreduce.Runner</main-class>
<java-opts>-Xmx4096m</java-opts>
<arg>--config</arg>
<arg>complexConfigGoesHere</arg>
</java>
<ok to="end"/>
<error to="kill"/>
</action>
I suppose that solution should be in increasing task timeout.

Ozzie Inputformat MapReduce API

I'm trying to create an OOZIE job with a custom inputformat. I am using the new API and have set :
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
The property name I'm triying is :
<property>
<name>mapreduce.job.inputformat.class</name
<value>org.lab41.dendrite.generator.kronecker.mapreduce.lib.input.QuotaInputFormat</value>
</property>
Is the correct property name?
you can see this page
https://cwiki.apache.org/confluence/display/OOZIE/Map+Reduce+Cookbook
correct property name should be 'mapred.input.format.class'
so, you can write like this :
<property>
<name>mapred.input.format.class</name>
<value>org.lab41.dendrite.generator.kronecker.mapreduce.lib.input.QuotaInputFormat</value>
</property>

Starting an Extension in Alfresco 4.0

I want to run the extension
("C:\Alfresco\tomcat\shared\classes\alfresco\extension\scheduled-action-services-context.xml)
I have made some changes in it, however when I restart the server I don't see it in my log files.
Do I need something else to start it.
EDIT:
Here is the content of my scheduled action services:
<!--
Define the model factory used to generate object models suitable for use with freemarker templates.
-->
<bean id="templateActionModelFactory" class="org.alfresco.repo.action.scheduled.FreeMarkerWithLuceneExtensionsModelFactory">
<property name="serviceRegistry">
<ref bean="ServiceRegistry"/>
</property>
</bean>
<!--
Execute the script /Company Home/Record Management/testscript.js
-->
<bean id="runScriptAction" class="org.alfresco.repo.action.scheduled.SimpleTemplateActionDefinition">
<property name="actionName">
<value>script</value>
</property>
<property name="parameterTemplates">
<map>
<entry>
<key>
<value>script-ref</value>
</key>
<!-- Note that as of Alfresco 4.0, due to a Spring upgrade, the FreeMarker ${foo} entries must be escaped -->
<value>\$\{selectSingleNode('workspace://SpacesStore', 'lucene', 'PATH:"/app:company_home/app:dictionary/app:scripts/cm:send_mail.js"' )\}</value>
</entry>
</map>
</property>
<property name="templateActionModelFactory">
<ref bean="templateActionModelFactory"/>
</property>
<property name="dictionaryService">
<ref bean="DictionaryService"/>
</property>
<property name="actionService">
<ref bean="ActionService"/>
</property>
<property name="templateService">
<ref bean="TemplateService"/>
</property>
</bean>
<!--
Run the script every minute - select the single node company home that is not used ...
-->
UNTIL_FIRST_FAILURE
IGNORE
lucene
workspace://SpacesStore
+#ia\:fromDate:\$\{luceneDateRange(now, \"P10D\")\} AND +PATH:"/app:company_home/st:sites/cm:prova/cm:calendar//*"</value>-->
+PATH:"/app:company_home/st:sites/cm:valdel/cm:calendar//*" AND +#ia\:fromDate:[NOW TO MAX]
</property>
<property name="cronExpression">
<value>0 * 8 * * ?</value>
</property>
<property name="jobName">
<value>jobD</value>
</property>
<property name="jobGroup">
<value>jobGroup</value>
</property>
<property name="triggerName">
<value>triggerD</value>
</property>
<property name="triggerGroup">
<value>triggerGroup</value>
</property>
<property name="scheduler">
<ref bean="schedulerFactory"/>
</property>
<property name="actionService">
<ref bean="ActionService"/>
</property>
<property name="templateActionModelFactory">
<ref bean="templateActionModelFactory"/>
</property>
<property name="templateActionDefinition">
<ref bean="runScriptAction"/> <!-- This is name of the action (bean) that gets run -->
</property>
<property name="transactionService">
<ref bean="TransactionService"/>
</property>
<property name="runAsUser">
<value>System</value>
</property>
</bean>
AND WHEN I CHECK THE STDOUT, I'M SEEEING THIS ERROR:
2012-03-30 11:00:00,230 ERROR [freemarker.runtime] [DefaultScheduler_Worker-8] Template processing error: "No nodes selected"
No nodes selected
The problematic instruction:
==> ${selectSingleNode('workspace://SpacesStore', 'lucene', 'PATH:"/app:company_home/app:dictionary/app:scripts/cm:send_mail.js"' )} [on line 1, column 1 in string://fixed]
Java backtrace for programmers:
freemarker.template.TemplateModelException: No nodes selected
at org.alfresco.repo.action.scheduled.FreeMarkerWithLuceneExtensionsModelFactory$QueryForSingleNodeFunction.exec(FreeMarkerWithLuceneExtensionsModelFactory.java:180)
Could someone explain me this.. I have written a cron to run every 8 minutes but its not working
"Run the extension" doesn't really make sense. When you start Tomcat, the Alfresco web application will load that Spring configuration file automatically because it is on the classpath (assuming you have set up your shared classloader correctly) and it ends in "context.xml". If you aren't seeing something you expect in the log files, check log4j.properties to make sure you have a logger set. If all else fails, use a remote debugger like Eclipse and set a breakpoint in one of the Java classes referred to by your context file.
Maybe your error is because of https://issues.alfresco.com/jira/browse/ALF-9981

Resources