Running jar file map reduce without Hdfs - dictionary

I have bundled a jar from my eclipse project. I would like to pass arguments to the jar. Basically an input file to the jar. I would like to know how to give an input file that is not in Hdfs. I know that's not now hadoop works but this is for testing purposes. Eclipse has the feature for local files. Is there a way to do this via command line?

You can run hadoop in 'local' mode by overriding the job tracker and file system properties from the command line:
hadoop jar <jar-file> <main-class> -fs local -jt local <other-args..>
You need to be using the GenricOptionsParser (which is the norm if you're using ToolRunner to launch your jobs.

Related

CLI method to find all nested jar files containing the log4j library

I have read here that the log4j library can be "nested" within other files that are deployed with an application.
I can find files with 'log4j' in the filename but don't know how to find log4j in these "nested jars". Is there a way to do this from the command line?
update
This question has moved to SuperUser here.
Run the following command search for log4j jar files in an application:
dir /s /b <application_root>\*log4j*.jar
If any files are displayed, check the version number that is part of the file name. For example Tomcat\webapps\ROOT\WEB-INF\lib\log4j-core-2.15.0.jar is version 2.15.
If the version is between 2 and 2.14 (2.15 is not vulnerable), the application is vulnerable to CVE-2021-44228 and one of the following mitigations must be applied.

A .jar file does not run after building it

Actually, I'm trying to add new language to Streama Media Server. I downloaded source code, added new language file (as guided here). After that, I want to build a jar with that project, I build it with IntelliJ Idea 2021.1 (here is how I did). So, When I run that jar file (in Ubunt 20.04), it fails and gives this error: Screenshot
When you have made adjustments to the source code, it is likely that you will want to create a new .jar file and deploy it on your server. For this, you can use a simple command:
# for unix based systems
**./gradlew assemble**
# for windows
**./gradlew.bat assemble**
This will create 2 new .jar files under build/libs,
streama-{version}.jar
streama-{version}.jar.original
all you will need is the streama-{version}.jar.
This file is an executable, so you can just copy it into your deployment directory / your server and start it as usual.

SparkR : how to acces files passed with --files in yarn-cluster mode

I am sending a sparkR Job to run on a Yarn cluster in cluster mode with ./bin/spark-submit script. I need to upload a file (external dataset) by the --file option. This action upload files to HDFS tempory directory. But I need to access to the path where the file was downloaded to include it directly in my SparkR code.
For java and PySpark, files distributed using --files can be accessed via SparkFiles.get(filename) method which return the absolute path of filename. Is there an equivalent in SparkR ?
I know we can work around the problem by different ways :
Put files manualy to HDFS
Deploy files on worker nodes
But I want to use this option for convinient reasons.

hive looks for hdfs private directory for jar

How does add jar work in hive? when I add a local jar file,
add jar /users/course/jars/json-serde-1.3.1.jar;
hive query fails and says it could not find the jar in hdfs, same directory.
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/users/course/jars/json-serde-1.3.1.jar)
Then I put the jar into hdfs, add jar using that hdfs filepath.
add jar hdfs://localhost/users/course/jars/json-serde-1.3.1.jar;
Now, hive query says
File does not exist: hdfs://localhost:9000/private/var/folders/k5/bn104n8s72sdpg3tg7d8kkpc0000gn/T/a598a513-d7c9-4d55-9280-b6554487cac7_resources/json-serde-1.3.1.jar
I have no idea why it keeps looking for the jar in wrong places.
I believe Hive looks for the JAR locally, not on HDFS.
So if my home directory on the gateway server is
pwd
/home/my_username/
And the JAR is sitting locally at:
/home/my_username/hive_udfs/awesomeness.jar
Then I'd go into the hive shell and run:
add jar /home/my_username/awesomeness.jar
At least, that works for me in my environment. HTH. Good luck! :)

Passing environment variables through jar file which app uses

I am currently trying out on the docker link between my app and db containers. I've checked on my app container and environment variables are automatically set when I link the containers together.
What I want to do is for my config file, which is packaged into a jar file, to receive the environment variables and set the required values to it. Any advice or help?
And this is how I create a config file in my jar file to connect to MySQL
database { url="jdbc:mysql://${MYSQL_PORT_3306_TCP_ADDR}:${MYSQL_PORT_3306_TCP_PORT}/mydb" driver="com.mysql.jdbc.Driver"}
Updating the config file inside the jar could be quite overkill.
It think you have several choices
read the config environment variable directly in you program
use variable either directly or generate the config file there
create launch script (details of this depends of you guest os in docker how to do it; sh/bash for linux etc..)
that script can generate new config file from environment and put it on classpath before jar so you program sees it.
EDIT: added example
You can save this kind of launcher script on docker image which dynamically creates configuration before launching actual program.
#!/bin/bash
# some default values for testing even without links to other container
MYSQL_PORT_3306_TCP_ADDR=${MYSQL_PORT_3306_TCP_ADDR:-127.0.0.1}
MYSQL_PORT_3306_TCP_PORT=${MYSQL_PORT_3306_TCP_PORT:-3306}
cat << EOF > /opt/yourprogram/dbconfig.conf
database { url="jdbc:mysql://${MYSQL_PORT_3306_TCP_ADDR}:${MYSQL_PORT_3306_TCP_PORT}/mydb" driver="com.mysql.jdbc.Driver"
}
EOF
scala -classpath /opt/yourprogram YourProgram
What I did is that I wrote the sh file in my directory /tmp/restcore-1.0-SNAPSHOT/bin like this:
#!/bin/bash echo "database{url="jdbc:mysql://"${MYSQL_PORT_3306_TCP_ADDR}":"${MYSQL_PORT_3306_TCP_PORT}"/mydb" driver="com.mysql.jdbc.Driver" }" > myconf.conf
jar uf /tmp/restcore-SNAPSHOT/lib/com.organization.restcore-1.0-SNAPSHOT.jar /tmp/restcore-1.0-SNAPSHOT/bin/myconf.conf
After building the Dockerfile and running the sh file in CMD, I use cat myconf.conf to check the config file and I'll be able to see the environment set.

Resources