Overriding default hadoop jars in class path - jar

I've seen many manifestations of ways to use the user class path as precedent to the hadoop one. Often times this is done if an m/r job needs a specific version of a library that hadoop coincidentally already uses an older version of (for example jackson's json parser or commons http , etc.)
In any case : I've seen :
mapreduce.task.classpath.user.precedence
mapreduce.task.classpath.first
mapreduce.job.user.classpath.first
Which one of these parameters is the right one to set in my job configuration, in order to force mappers and reducers to have a class path which puts my user defined hadoop_classpath jars BEFORE the hadoop default dependency jars ?
By the way, this is related to this question :
Dynamodb requestHandler acception which I recently have found is due to a jar conflict.

So, assuming you're using 0.20.203, this is handled in the TaskRunner.java code as follows:
The property you're looking for is on line 94 - mapreduce.user.classpath.first
Line 214 is where the call is made to build the list of classpaths, which delegates to a method called getClassPaths(..)
getClassPaths() is defined on line 524, and you should be able to see that the configuration property is used to decide on whether your job + dist cache libraries, or the hadoop libraries go on the classpath first
For other versions of hadoop, you're best to check the TaskRunner.java class to confirm the name of the config property after all this is a "semi hidden config":
static final String MAPREDUCE_USER_CLASSPATH_FIRST =
"mapreduce.user.classpath.first"; //a semi-hidden config

As in the latest Hadoop version (2.2+), you should set:
conf.setBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true);

These settings work for referencing classes of external jars only in your mapper or reducer tasks. If, however, you are using these in, for example a customized InputFormat, it will fail to load the class. A way to make sure this also works everywhere (in MR2) is exporting this setting when submitting your job:
export HADOOP_USER_CLASSPATH_FIRST=true

I had the same issue and the parameter that worked for me on Hadoop Version 0.20.2-cdhu03 is "mapreduce.task.classpath.user.precedence"
This setting is tested not work on CDH3U3, following answer is from Cloudera team:
// JobConf job = new JobConf(getConf(), MyJob.class);
// job.setUserClassesTakesPrecedence(true);
http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapred/JobConf.html#setUserClassesTakesPrecedence%28boolean%29

In the MapR distribution, the property is "mapreduce.task.classpath.user.precedence"
http://www.mapr.com/doc/display/MapR/mapred-site.xml
<property>
<name>mapreduce.task.classpath.user.precedence</name>
<value>true</value>
<description>Set to true if user wants to set different classpath. (AVRO) </description>
</property>
jobConf.setUserClassesTakesPrecedence(true);

Related

The transaction currently built is missing an attachment for class - Attempted to find a suitable attachment but could not find any in the storage

Full Error:
transactions.TransactionBuilder. - The transaction currently built is missing an attachment for class: com/gibtn/corda/printutilities/PrintLedgerTransaction. Attempted to find a suitable attachment but could not find any in the storage.
This has been asked here and here but I hope to get better clarification.
Problem:
I have built a set of libraries to perform common tasks in my Flows that I include in all my CorDapps. For now I just copy the JARs into each project, make some changes to the gradle files and everything works great.
I recently put together a small library for performing common tasks in Contracts and added the JAR the same way.
This works fine with MockNodes. But when I test with real nodes I will get this error in the CRaSH shell and the transaction will fail with a NoClassDefFoundError exception.
Question:
Is what I am doing even possible? Or do I always have to keep my utility classes inside the Contracts module in IntelliJ so they are bundled together with the Contracts into a single JAR? That way when the node starts the JAR (containing the Contracts and any utilities) is added to Attachment storage as a single Attachment.
I found a way to solve this. It's a bit dirty but initial testing seems to work. I just created a blank class in my utilities JAR that implements Contract. It's verify() method is empty. Now when the Corda node starts it sees this Contract and adds the JAR to Attachment storage. So from the CRaSH shell if I run:
attachments trustInfo
...my utility JAR will be listed (it wasn't before). I see when I use one of the utility methods in a Contract the utility JAR will be included as a separate Attachment in the WireTransaction.
I'm not crazy about this solution and will probably stop using a utility JAR for Contracts. I'll go back to copying the classes into each project. Nevertheless there is a way to do it. I would just need a more experienced Corda developer to give it their blessing before I'd go forward into production with it.

order of precedence of Sling run modes

I have a doubt over this question
Question: What is the correct order of precedence to setup runmodes in aem? (From left to right, left beign the highest)?
A. System property, Sling properties file, jar file
B. jar file, sling properties file, system property
C. Sling properties file, jar file, system property
D. jar file, System property, Sling properties file
Answer : B
I had gone through various docs and had done multiple experiments over this.
Acc to Adobe documentation the order is - Sling.properties, System property, jar file
Similarly, this Adobe doc has a contradictory opinion - jar file, sling.properties, system property
Also, Apache Sling Doc says that any property to option D (-D) set in manner, n=v, overwrites same named properties in the sling.properties file. which means system property has higher precedence then sling.properties.
Now, these are all according to docs, what I had experimented is-
I made a path ${dir}/crx-quickstart/conf and created a file sling.properties and wrote sling.run.modes=publish. Then renamed the jar file as cq-author-7502.jar. Then run this jar with the command java -jar cq-author-7502.jar -Dsling.run.modes=prod
This is my observation:
1. When the jar runs, Setting 'sling.run.modes' to 'publish' from sling.properties. this message is shown in the terminal.
2. The instance up in author mode. And
3. When I checked the instance-mode in felix console, it was prod
I am totally confused about the order of precedence. As everything seems contradictory to me.
It would be grateful if anyone can put some light on it..
Thank you
I think it depends on when we are checking the run mode precedence, at the time of installation or later on a running instance and how we are starting our instance. There are 2 kinds of run modes. Installation time run mode, custom run modes.
Installation time run mode - As explained by official run modes documentation and setup instructions, this can be set only one time at the time of installation. This includes author,publish,nosamplecontent,samplecontent
Custom run mode - Own customized run modes e.g. dev, qa, prod etc
I did some tests (AEM 6.1), precedence is working in following way
Initial setup
Start jar (by double clicking) - In this you do not have option to set run mode in sling.properties, start script first time. JAR name takes precedence.
Unpack jar and specify run mode as system properties in start script - JAR name doesn't comes to picture here. In this you do not have option to set run mode in sling.properties. System properties takes precedence.
Running instance
Even if we change run mode in JAR name, it doesn't changes the installation time run mode. For custom run mode, JAR file name is not applicable. Order of precedence is sling.properties -> specifying -r option (command line jar option) -> system properties (start script)
As far as the question (seems to be AEM certification question), the context is not clear with respect to which they are asking. Helpx article is contributed by community, context might be different. Sling documentation link (it seems as per this link the launchpad version in AEM is old, not 2.4.0). Need to ask Adobe to confirm :).
There are two conflicting Adobe articles that say something quite different
Article 1: (Assumed more recent)
Starting CQ with a specific run mode If you have defined
configurations for multiple run modes then you need to define which is
to be used upon startup. There are several methods for specifying
which run mode to use; the order of resolution is:
sling.properties file
-r option
system properties (-D)
Filename detection
From this Reference: Configure Run Modes
- the answer is C
Article 2:
Behavior when run modes are specified more than one way The run mode
specified in the naming of the jar file takes precedence. If run modes
are not specified in the naming of the jar file, the values in the
sling.properties file are used. If run modes are not specified in
either the naming of the jar file or the sling.properties file, the
system property (or JVM argument) is used.
From this Reference: Configure Run Modes
- the answer is B
However based on my experience and based on process of elimination I'd go with answer B.

Can I run code at Alfresco startup?

I have an Alfresco module that I would like to have do some cleanup when a new version of it is installed.
In the current situation, an older version of the module created a folder node with custom properties at the root of the repository. We've since decided to have multiple such nodes, and none of them at that location. I'd like to put into the next version of the module code that would run at Alfresco startup, check for the existence of the old node, copy its properties into the appropriate new nodes, and delete the old node.
Is such a thing possible? I've looked at the Bootstrap configuration file, but that appears to only allow one to add things to the repository, not modify or delete them.
My suggestion is that you write a patch. That is a class that implements
org.alfresco.repo.admin.patch.AbstractPatch
Then you can do pretty much anything you want on bootstrap (except executing searches against solr since it wont be available).
Add some spring configuration, take a look at the file patch-services-context.xml for inspiration.
Yes you can do that, probably you missed the correct place in the documentation about that:
If you open Import Strategy you'll find a section Per BootstrapView, you should be using something like REPLACE_EXISTING or UPDATE_EXISTING for your ACP packaged content (if you're using ACPs as your bootstrap importing strategy).
Here is a more detailed description of the UUID Bindings values.
Hope that helps.
You can use patches.
When alfresco server starts it applies patches and executes database updates etc.
Definition :
A patch is a piece of Java code that executes once when Alfresco
Content Services starts. Custom patches can be implemented.
Documentation Link

Editing configuration files in Pax Exam

I am using Pax Exam to perform integration tests to my OSGi application. I have a configuration factory in which I specify the Karaf feature of my application to be installed in the test container and then modify some of a proerty of a .cfg file installed as part of my feature.
public class TestConfigurationFactory implements ConfigurationFactory {
#Override
public Option[] createConfiguration() {
return options(
karafDistributionConfiguration()
.frameworkUrl(
maven().groupId("org.apache.karaf")
.artifactId("apache-karaf")
.version("3.0.1").type("tar.gz"))
.unpackDirectory(new File("target/exam"))
.useDeployFolder(false),
keepRuntimeFolder(),
// Karaf (own) features.
KarafDistributionOption.features(
maven().groupId("org.apache.karaf.features")
.artifactId("standard").classifier("features")
.version("3.0.1").type("xml"), "scr"),
// CXF features.
KarafDistributionOption.features(maven()
.groupId("org.apache.cxf.karaf")
.artifactId("apache-cxf").version("2.7.9")
.classifier("features").type("xml")),
// Application features.
KarafDistributionOption.features(
maven().groupId("com.me.project")
.artifactId("my-karaf-features")
.version("1.0.0-SNAPSHOT")
.classifier("features").type("xml"), "my-feature"),
KarafDistributionOption.editConfigurationFilePut(
"etc/com.me.test.cfg", "key", "value"));
}
}
The property I specify in editConfigurationFilePut is modified correctly, however the rest of the .cfg file's properties are deleted. If I use the editConfigurationFilePut method to edit one of Karaf's configuration files it works as expected (just adds the new property without modifying the existing ones) so I am thinking that perhaps the problem is that Pax Exam attempts to modify the configuration before the .cfg file is installed by my feature and therefore creates a new file to put the property in. If this is the case is there some way to synchronise this process so that the .cfg file is edited only after the feature is properly installed?
There are a two different reasons for this.
1) The feature does get installed after the configfile has been "edited"
2) The feature only contains a config section and not a configfile section
I'd guess reason one is the most likely cause of this since it needs a running Karaf to install a feature through Pax Exam. So to work around reason one, replace the config with a config file present in your test project.
For reason two, make sure the feature actually does reference a config instead of a configuration admin config, or add your config to the configuration of the config-admin service. You can achieve this by injecting the ConfigAdmin service in your unit test and add your properties to the configuration pid.
EDIT:
Combine both solutions
Since because of 1) it takes longer for the config-file to be actually available, let config-admin service do the rest.
Make sure your test does retrieve the config-admin service either by injecting it or by waiting for it's availability.
Now within a #Before method make sure you wait till your config is complete and change it from there on. This way you don't need to duplicate the config files.

External classes/jar in OSGi

My application supports running on many dbms and it requires user to configure dbms connection setting and also provide the jdbc jar file.
Now the application is to be packaged as OSGi bundle. There will be another main jar which lanches OSGi server and starts the application as bundle.
Can you please suggest how can I package the application as bundle and let user provide the jdbc jar file.
Will it require something like the main launcher jar specifying JDBC driver classes as FRAMEWORK_SYSTEMPACKAGES property?
Thanks in advance,
Aman
There are two ways of doing this:
1) Adding the driver.jar to the classpath of the main launcher and, like you say, expose its packages via the framework by specifying that property (or actually you can use the FRAMEWORK_SYSTEMPACKAGES_EXTRA property to just specify additional packages, instead of specifying all of them).
2) Manually wrapping the driver.jar as a bundle, or doing it dynamically at runtime. For example, you could try to wrap bundles that are copied to a certain folder (similar to what Apache Felix File Install does) by using Pax URL or some other tool that can create a bundle out of an ordinary jar file for you (see http://team.ops4j.org/wiki/display/paxurl/Pax+URL).

Resources