Oozie Coordinator config properties - oozie

I have an Oozie coordinator which calls a specific Oozie workflow.
In order to invoke this coordinator, I need to provide the workflow with all the config properties it requires. Does this mean I should be duplicating all the config properties that belong in the workflow's job.properties file and putting them in the coordinator.properties file as well? Or am I missing something?

The only properties file that is loaded by Oozie is the one that you provide when you call the Oozie command line tool, so yes you will need to have the properties applicable to the workflow included in the properties file that you provide when starting your coordinator.

Being new to the whole Hadoop environment and learning as I go I would just like to add a bit of further clarification for others like me to save them time searching
When you just have a workflow the Oozie job is started with the Workflows properties file
When you have a coordinator which calls the workflow then the coordinators properties file is used ( the workflow's properties file is ignored)
When you have a bundle job which calls one or more coordinators (which in turn call the workflows) then the bundle's properties file is used (the coordinators and workflows properties files are ignored)
Further to this as mentioned previously the properties file name mentioned in the oozie command is what is used
Hope this clarifies things a bit more

Related

The transaction currently built is missing an attachment for class - Attempted to find a suitable attachment but could not find any in the storage

Full Error:
transactions.TransactionBuilder. - The transaction currently built is missing an attachment for class: com/gibtn/corda/printutilities/PrintLedgerTransaction. Attempted to find a suitable attachment but could not find any in the storage.
This has been asked here and here but I hope to get better clarification.
Problem:
I have built a set of libraries to perform common tasks in my Flows that I include in all my CorDapps. For now I just copy the JARs into each project, make some changes to the gradle files and everything works great.
I recently put together a small library for performing common tasks in Contracts and added the JAR the same way.
This works fine with MockNodes. But when I test with real nodes I will get this error in the CRaSH shell and the transaction will fail with a NoClassDefFoundError exception.
Question:
Is what I am doing even possible? Or do I always have to keep my utility classes inside the Contracts module in IntelliJ so they are bundled together with the Contracts into a single JAR? That way when the node starts the JAR (containing the Contracts and any utilities) is added to Attachment storage as a single Attachment.
I found a way to solve this. It's a bit dirty but initial testing seems to work. I just created a blank class in my utilities JAR that implements Contract. It's verify() method is empty. Now when the Corda node starts it sees this Contract and adds the JAR to Attachment storage. So from the CRaSH shell if I run:
attachments trustInfo
...my utility JAR will be listed (it wasn't before). I see when I use one of the utility methods in a Contract the utility JAR will be included as a separate Attachment in the WireTransaction.
I'm not crazy about this solution and will probably stop using a utility JAR for Contracts. I'll go back to copying the classes into each project. Nevertheless there is a way to do it. I would just need a more experienced Corda developer to give it their blessing before I'd go forward into production with it.

order of precedence of Sling run modes

I have a doubt over this question
Question: What is the correct order of precedence to setup runmodes in aem? (From left to right, left beign the highest)?
A. System property, Sling properties file, jar file
B. jar file, sling properties file, system property
C. Sling properties file, jar file, system property
D. jar file, System property, Sling properties file
Answer : B
I had gone through various docs and had done multiple experiments over this.
Acc to Adobe documentation the order is - Sling.properties, System property, jar file
Similarly, this Adobe doc has a contradictory opinion - jar file, sling.properties, system property
Also, Apache Sling Doc says that any property to option D (-D) set in manner, n=v, overwrites same named properties in the sling.properties file. which means system property has higher precedence then sling.properties.
Now, these are all according to docs, what I had experimented is-
I made a path ${dir}/crx-quickstart/conf and created a file sling.properties and wrote sling.run.modes=publish. Then renamed the jar file as cq-author-7502.jar. Then run this jar with the command java -jar cq-author-7502.jar -Dsling.run.modes=prod
This is my observation:
1. When the jar runs, Setting 'sling.run.modes' to 'publish' from sling.properties. this message is shown in the terminal.
2. The instance up in author mode. And
3. When I checked the instance-mode in felix console, it was prod
I am totally confused about the order of precedence. As everything seems contradictory to me.
It would be grateful if anyone can put some light on it..
Thank you
I think it depends on when we are checking the run mode precedence, at the time of installation or later on a running instance and how we are starting our instance. There are 2 kinds of run modes. Installation time run mode, custom run modes.
Installation time run mode - As explained by official run modes documentation and setup instructions, this can be set only one time at the time of installation. This includes author,publish,nosamplecontent,samplecontent
Custom run mode - Own customized run modes e.g. dev, qa, prod etc
I did some tests (AEM 6.1), precedence is working in following way
Initial setup
Start jar (by double clicking) - In this you do not have option to set run mode in sling.properties, start script first time. JAR name takes precedence.
Unpack jar and specify run mode as system properties in start script - JAR name doesn't comes to picture here. In this you do not have option to set run mode in sling.properties. System properties takes precedence.
Running instance
Even if we change run mode in JAR name, it doesn't changes the installation time run mode. For custom run mode, JAR file name is not applicable. Order of precedence is sling.properties -> specifying -r option (command line jar option) -> system properties (start script)
As far as the question (seems to be AEM certification question), the context is not clear with respect to which they are asking. Helpx article is contributed by community, context might be different. Sling documentation link (it seems as per this link the launchpad version in AEM is old, not 2.4.0). Need to ask Adobe to confirm :).
There are two conflicting Adobe articles that say something quite different
Article 1: (Assumed more recent)
Starting CQ with a specific run mode If you have defined
configurations for multiple run modes then you need to define which is
to be used upon startup. There are several methods for specifying
which run mode to use; the order of resolution is:
sling.properties file
-r option
system properties (-D)
Filename detection
From this Reference: Configure Run Modes
- the answer is C
Article 2:
Behavior when run modes are specified more than one way The run mode
specified in the naming of the jar file takes precedence. If run modes
are not specified in the naming of the jar file, the values in the
sling.properties file are used. If run modes are not specified in
either the naming of the jar file or the sling.properties file, the
system property (or JVM argument) is used.
From this Reference: Configure Run Modes
- the answer is B
However based on my experience and based on process of elimination I'd go with answer B.

Can I run code at Alfresco startup?

I have an Alfresco module that I would like to have do some cleanup when a new version of it is installed.
In the current situation, an older version of the module created a folder node with custom properties at the root of the repository. We've since decided to have multiple such nodes, and none of them at that location. I'd like to put into the next version of the module code that would run at Alfresco startup, check for the existence of the old node, copy its properties into the appropriate new nodes, and delete the old node.
Is such a thing possible? I've looked at the Bootstrap configuration file, but that appears to only allow one to add things to the repository, not modify or delete them.
My suggestion is that you write a patch. That is a class that implements
org.alfresco.repo.admin.patch.AbstractPatch
Then you can do pretty much anything you want on bootstrap (except executing searches against solr since it wont be available).
Add some spring configuration, take a look at the file patch-services-context.xml for inspiration.
Yes you can do that, probably you missed the correct place in the documentation about that:
If you open Import Strategy you'll find a section Per BootstrapView, you should be using something like REPLACE_EXISTING or UPDATE_EXISTING for your ACP packaged content (if you're using ACPs as your bootstrap importing strategy).
Here is a more detailed description of the UUID Bindings values.
Hope that helps.
You can use patches.
When alfresco server starts it applies patches and executes database updates etc.
Definition :
A patch is a piece of Java code that executes once when Alfresco
Content Services starts. Custom patches can be implemented.
Documentation Link

Location of Schema's for Custom Action Executor for Oozie

As per the oozie documentation, I can create a Custom Oozie ActionExecutor.
To do this, I need to create a class that extends ActionExecutor. The class needs to be packaged as a Jar. This jar should be placed under the lib directory of oozie server. This part is clear.
However, where should I place the XSD file that defines the Action? Searched in the oozie server locations and I could not find a clue. Any help?
Per the Oozie documentation:
The XML schema (XSD) for the new Actions should be added to
oozie-site.xml, under the property
'oozie.service.WorkflowSchemaService.ext.schemas'. A comma separated
list for multiple Action schemas.

Overriding default hadoop jars in class path

I've seen many manifestations of ways to use the user class path as precedent to the hadoop one. Often times this is done if an m/r job needs a specific version of a library that hadoop coincidentally already uses an older version of (for example jackson's json parser or commons http , etc.)
In any case : I've seen :
mapreduce.task.classpath.user.precedence
mapreduce.task.classpath.first
mapreduce.job.user.classpath.first
Which one of these parameters is the right one to set in my job configuration, in order to force mappers and reducers to have a class path which puts my user defined hadoop_classpath jars BEFORE the hadoop default dependency jars ?
By the way, this is related to this question :
Dynamodb requestHandler acception which I recently have found is due to a jar conflict.
So, assuming you're using 0.20.203, this is handled in the TaskRunner.java code as follows:
The property you're looking for is on line 94 - mapreduce.user.classpath.first
Line 214 is where the call is made to build the list of classpaths, which delegates to a method called getClassPaths(..)
getClassPaths() is defined on line 524, and you should be able to see that the configuration property is used to decide on whether your job + dist cache libraries, or the hadoop libraries go on the classpath first
For other versions of hadoop, you're best to check the TaskRunner.java class to confirm the name of the config property after all this is a "semi hidden config":
static final String MAPREDUCE_USER_CLASSPATH_FIRST =
"mapreduce.user.classpath.first"; //a semi-hidden config
As in the latest Hadoop version (2.2+), you should set:
conf.setBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true);
These settings work for referencing classes of external jars only in your mapper or reducer tasks. If, however, you are using these in, for example a customized InputFormat, it will fail to load the class. A way to make sure this also works everywhere (in MR2) is exporting this setting when submitting your job:
export HADOOP_USER_CLASSPATH_FIRST=true
I had the same issue and the parameter that worked for me on Hadoop Version 0.20.2-cdhu03 is "mapreduce.task.classpath.user.precedence"
This setting is tested not work on CDH3U3, following answer is from Cloudera team:
// JobConf job = new JobConf(getConf(), MyJob.class);
// job.setUserClassesTakesPrecedence(true);
http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapred/JobConf.html#setUserClassesTakesPrecedence%28boolean%29
In the MapR distribution, the property is "mapreduce.task.classpath.user.precedence"
http://www.mapr.com/doc/display/MapR/mapred-site.xml
<property>
<name>mapreduce.task.classpath.user.precedence</name>
<value>true</value>
<description>Set to true if user wants to set different classpath. (AVRO) </description>
</property>
jobConf.setUserClassesTakesPrecedence(true);

Resources