hive looks for hdfs private directory for jar - jar

How does add jar work in hive? when I add a local jar file,
add jar /users/course/jars/json-serde-1.3.1.jar;
hive query fails and says it could not find the jar in hdfs, same directory.
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/users/course/jars/json-serde-1.3.1.jar)
Then I put the jar into hdfs, add jar using that hdfs filepath.
add jar hdfs://localhost/users/course/jars/json-serde-1.3.1.jar;
Now, hive query says
File does not exist: hdfs://localhost:9000/private/var/folders/k5/bn104n8s72sdpg3tg7d8kkpc0000gn/T/a598a513-d7c9-4d55-9280-b6554487cac7_resources/json-serde-1.3.1.jar
I have no idea why it keeps looking for the jar in wrong places.

I believe Hive looks for the JAR locally, not on HDFS.
So if my home directory on the gateway server is
pwd
/home/my_username/
And the JAR is sitting locally at:
/home/my_username/hive_udfs/awesomeness.jar
Then I'd go into the hive shell and run:
add jar /home/my_username/awesomeness.jar
At least, that works for me in my environment. HTH. Good luck! :)

Related

Oozie: Is there anything that needs to be done after placing an updated jar under lib folder?

I am trying to place an updated jar under lib path and removing the old jar. Unfortunately , I see the old logs in oozie console which were present in old jar. For confidential purpose I am unable to show logs here. But I am doing the below steps:
Replacing a jar (mycode.jar) under lib folder which is mentioned in workkflow.xml
Submitted the oozie job using oozie job -oozie http://host -config job.properties -run
When I see logs in console, I could see old jar(older version of mycode.jar) logs even if jar is replaced.
If you are talking about the lib directory in the oozie workflow application then you need not to do anything. The next execution of the workflow will automatically pick the new (updated) jar.
For updating the jars into share lib /user/oozie/share/lib/lib_*/* then after replacing the jar, you need to execute the following command to update the share lib into oozie server.
oozie admin -sharelibupdate
Hope this will help. Thanks.
To make sure issue is same I'll narrate what I was facing:
created a MapReduce JAR and placed it in lib folder.
Ran oozie(MapReduce action) job and picked the JAR as expected and ran fine.
I had some functionality changes in my code(JAR) so I added new log statements to make sure new JAR is being picked. Built the JAR and replaced the old JAR with newly built JAR in lib folder(hdfs)
Ran oozie job again, code from old JAR was executed because new log statements did not show up.
After few search I found following tips:
Clear the Yarn Cache: found this in HortonWorks site(https://community.hortonworks.com/articles/92339/how-to-clear-local-file-cache-and-user-cache-for-y.html) - pasting content below for reference
Short Description:
To use different version jar file with same name, clear cache on all NodeManager hosts to prevent the application using old jar
a. Find out the cache location by checking the value of the yarn.nodemanager.local-dirs property
< property >
< name >yarn.nodemanager.local-dirs< /name>
< value>/hadoop/yarn/local</value>
< /property>
b. Remove filecache and usercache folder located inside the folders that is specified in yarn.nodemanager.local-dirs.
[yarn#node2 ~]$ cd /hadoop/yarn/local/
[yarn#node2 local]$ ls filecache nmPrivate spark_shuffle usercache
[yarn#node2 local]$ rm -rf filecache/ usercache/
c. Restart YARN service.
I was unable to clear cache because I did not have the necessary access. Thus I followed below workaround
Rename the Package or class, since this package/class was written by me, I had the liberty to simply rename the class, thus in oozie when new Class name was looked up, automatically the new functionality was executed.
Option 2 may not be viable for many and the question remains open as to why oozie does not pick New JAR/Class.

How to recover an overwritten directory in cloudera

I was using HIVE in my Cloudera VM.
I used the below command to write my output of my HQL statement to an output file.
`INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/output'
SELECT * from City; ....`
After getting the output i found that all my files in the cloudera directory got over written. i can see only my output file in the path.
Is there any way i can undo or recover all the files that i've lost.
My hive.log file is below for any reference
You can try to search into hive trash folder (/user/hive/.Trash). I think that you will find your lost file.

File in executable jar cannot be found when running on AWS EC2

I have a .jar file executing on a aws ec2 instance which contains the following code:
List<String> lines = FileUtils.readLines(new File("googlebooks-eng-all-1gram-20120701-k"));
the file exists in projectname/res and also in /projectname directly. I included /res in the build path. Also I see that the file exists inside the jar file at the root if I export the .java file in eclipse.
If I run the jar localy on my pc it works fine. But if I run it on a ec2 instance it says:
java.io.FileNotFoundException: File 'googlebooks-eng-all-1gram-20120701-k' does not exist
How can that be?
On your PC it is reading from the actual file on the filesystem - that is what new File means - a file on the filesystem.
To access a resource in a jar file you need to call getResourceAsStream or something similar instead.

Running jar file map reduce without Hdfs

I have bundled a jar from my eclipse project. I would like to pass arguments to the jar. Basically an input file to the jar. I would like to know how to give an input file that is not in Hdfs. I know that's not now hadoop works but this is for testing purposes. Eclipse has the feature for local files. Is there a way to do this via command line?
You can run hadoop in 'local' mode by overriding the job tracker and file system properties from the command line:
hadoop jar <jar-file> <main-class> -fs local -jt local <other-args..>
You need to be using the GenricOptionsParser (which is the norm if you're using ToolRunner to launch your jobs.

sqlite database connectivity in servlet

I am writing jdbc code in Servlet
But I am getting an error
java.lang.ClassNotFoundException: org.sqlite.JDBC
I have included a sqlitejdbc-v056.jar file
Still I am getting an error.
If i write the same code in separate java file and run it as a java application,
it works properly but it is not working on server
p.s.- I am using Weblogic server.
I have found a solution on my problem.
We have to add the .jar file in lib folder in WEB-INF folder in our workspace
-"Workspace/WebContent/WEB-INF/lib/"
Also we need to include it in 'plugins' folder of eclipse.
-"eclipse/plugins/"

Resources