Hive query execution for custom udf is exepecting hdfs jar path instead of local path in CDH4 with Oozie flow - oozie

We are migrating from CDH3 to CDH4 and as part of this migration we are moving all the jobs that we have on CDH3. We have noticed one critical issue in this, when a work flow is executed through oozie for executing a python script which internally invoked a hive query(hive -e {query}), here in this hive query we are adding a custom jar using add jar {LOCAL PATH FOR JAR}, and created a temporary function for custom udf. And it looks ok till here. But when the query started executing with custom udf funtion it is failing with Distributed cache, File Not Found Exception which is looking for jar in the HDFS path instead of lookig in local path.
I am not sure if I am missing some configuration here.
Execption Trace:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files. Execution log at:
/tmp/yarn/yarn_20131107020505_79b41443-b9f4-4d36-a0eb-4f0d79cd3ce9.log
java.io.FileNotFoundException: File does not exist:
hdfs://aa.bb.com:8020/opt/nfsmount/mypath/custom.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:824)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
..... .....
any help on this is highly appreciated.
Regards,
GHK.

There are some few options. All the required jar should be in the classpath before you run hive query.
option 1: Add your custom jar by <file>/hdfs/path/to/your/jar</file> in oozie workflow
option 2: use attribute --auxpath /local/path/to/your/jar while calling your hive script in python. Eg: hive --auxpath /local/path/to/your.jar -e {query}

Related

Installing local .whl files on Databricks cluster

I am trying to connect to a databricks cluster and install a local python whl using DatabricksSubmitRunOperator on Airflow (v2.3.2) with following configuration. However, it doesn't work and throws a fileNotFound exception (I checked file path multiple times, file exists).
task1 = DatabricksSubmitRunOperator(
task_id = <task_id>,
job_name = <job_name>,
existing_cluster_id = <cluster_id>,
libraries=[
{"whl": "file:/<local_absolute_path>"}
]
)
While the official documentation states that, for .whl files, only DBFS and S3 storage is supported, in Airflow, I see the following error message when prefix file:/ is not attached:
Library installation failed for library due to user error.
Error messages: Python wheels must be stored in dbfs, s3, adls, gs or as a local file. Make sure the URI begins with 'dbfs:', 'file:', 's3:', 'abfss:', 'gs:'
Is it possible install local .whl files on a databricks cluster?
Alternative approach I tried is to copy .whl to dbfs storage and install it from there. The problem with that is that installation status is stuck at "pending".
Any help is appreciated.
You can directly install or upload .whl file as shown in the below image.
or
Follow this official document installing .whl packages.

OpenEdge 10.2A - Unable to Run the Added Procedure in Custom .pl File

I have added a new .p procedure (prodict/myProc.p) in prodict.pl file and saved the prodict.pl file under my program's root folder.
Also, I have added the path of the folder to the PROPATH and it is the first item in the PROPATH.
In order to run the procedure, in the Procedure Editor, I try to run using the code below
RUN prodict/myProc.p
The error message I receive is:
How can I make my procedure run?
Note: I'm trying this in order to create a custom prodict/load_df.p, so it can be run without the need of any user interaction. My older question can be found here.
Instead of RUN "prodict/myProc.p", I have used
RUN value(search("prodict/myProc.p")).
The error message is changed to
Cannot run procedure file prodict/myProc.p from library. (1976)
When we look at the error description:
Can't run procedure file from library. (1976)
The named file in the library reference (e.g, progname.p in libname.pl<>) in the RUN statement is a source file, and cannot be run. Only PROGRESS r-code files can be run from a library.
Solution: I have added the compiled .r file to the library through proenv:
prolib prodict.pl -add prodict/myProc.r
and changed the calling code as:
RUN "prodict/myProc.r".
Now the code runs. Thanks to Stefan Drissen for showing me a way to get the real error message.

XQuery process:execute how to execute external programm?

I am running exist-db on windows and would like to execute an external windows program.
This works inside the normal windows shell:
C:\path\to\webGLRtiMaker.exe C:\path\to\ImageFile.rti -q 90
And I would like to execute the same program from my xquery script (I have uploaded all the needed files according to my specified paths to my exist-db):
xquery version '3.1';
import module namespace process="http://exist-db.org/xquery/process" at "java:org.exist.xquery.modules.process.ProcessModule";
declare variable $options := '<options>
<workingDir>/db/apps/test-project/images</workingDir>
<stdin><line>/db/apps/execute-test/images/image1.rti -q 90</line></stdin>
</options>';
(:process:execute($webRtiMaker, <options/>):)
process:execute('/db/apps/execute-test/resources/RTIMaker/webGLRtiMaker.exe', $options)
Even if I only execute the program without parameters (if I execute it inside windows I get the parameters as overview inside the command prompt so I should also receive some kind of output):
process:execute('/db/apps/execute-test/resources/RTIMaker/webGLRtiMaker.exe', <options/>)
But I get the error:
exerr:ERROR An IO error occurred while executing the process /db/apps/execute-test/resources/RTIMaker/webGLRtiMaker.exe: Cannot run program "/db/apps/execute-test/resources/RTIMaker/webGLRtiMaker.exe": CreateProcess error=2, The System cannot find the file ...
I used this as reference: Execute External Process
What am I doing wrong?
I have not tried this recently, but try the following:
import module namespace process="http://exist-db.org/xquery/process" at "java:org.exist.xquery.modules.process.ProcessModule";
let $cmd := 'C:\path\to\webGLRtiMaker.exe C:\path\to\ImageFile.rti -q 90'
return
<results>{process:execute($cmd, <options/>)}</results>
There is an article at the XQuery WikiBook about it.
Unfortunately it is not possible to start an executable that is stored inside the database. The java API requires direct access to a file on the filesystem, and the '/db/....' path is not.

AWS CodeBuild failure on getting source

I have CodeBuild project that works fine.
Trying to use it in CodePipeline and it failure with empty Repository and Submitter.
Failure logs are simple as:
01:34:17
[Container] 2018/03/08 01:34:10 Waiting for agent ping

01:34:17
[Container] 2018/03/08 01:34:12 Waiting for DOWNLOAD_SOURCE
There are no any settings to adjust CodeBuild phase anywhere.
How can I fix/customise it?
Recreate the build project from within CodePipeline, so it receives the source code from the provider called "CodePipeline".
Source of the information: https://apassionatechie.wordpress.com/2018/02/08/codebuild-aws-from-codepipeline-aws/
Just if somebody would need an answer.
The issue was in not precise file naming for CodeBuild stage where CodeDeploy in it's turn won't be able to pull the ZIP file.
As a fix I've added an extra command to builspec.yml
post_build:
commands:
- zip -r Application.zip target/Application-0.0.1.war

Documentum NPE when running as jar

I'm writing a simple application to create a Documentum folder structure from a directory structure on disk. When I run the application through SpringSource Tool Suite, it works fine. When I package it as a jar, with all dependencies, and run it, I receive the following error:
java.lang.NullPointerException
at com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:372)
at com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:333)
at com.documentum.fc.common.impl.preferences.PreferencesManager.<init>(PreferencesManager.java:41)
at com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
at com.documentum.fc.common.DfPreferences.getInstance(DfPreferences.java:43)
at com.documentum.fc.client.DfSimpleDbor.getDefaultDbor(DfSimpleDbor.java:78)
at com.documentum.fc.client.DfSimpleDbor.<init>(DfSimpleDbor.java:66)
at com.documentum.fc.client.DfClient$ClientImpl.<init>(DfClient.java:344)
at com.documentum.fc.client.DfClient.<clinit>(DfClient.java:754)
Here is the line in my code where this error occurs:
IDfClient client = DfClient.getLocalClient();
The jar includes the dfc.properties file, which I specify on the command line using
-Ddfc.properties.file=dfc.properties.dev
For the record, the full command line looks like this (slightly anonymized):
java -Ddfc.properties.file=dfc.properties.dev -jar MyTest-jar-with-dependencies.jar baseDirectory baseDocumentumFolder
Thanks much for your time!

Resources