running pyspark kafka steam with an error

running pyspark kafka steam with an error - jar

When I tried to run an example code for spark-steaming: "kafka_wordcount.py"
under the folder: /usr/local/spark/examples/src/main/python/streaming
The code explicitly describes the instruction to execute the code as:
" $ bin/spark-submit --jars \
external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar \
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test
test is the topic name. But I cannot find the jar and the path:
" external/kafka-assembly/target/scala-/spark-streaming-kafka-assembly-.jar"
So instead I created a folder "streaming/jar/" and put all jars from the
website http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-kafka-assembly_2.10%22 and then when I run
"park-submit --jars ~/stream-example/jars/spark-streaming-kafka-assembly_*.jar kafka_wordcount.py localhost:2181 topic"
which shows
"Error: No main class set in JAR; please specify one with --class
Run with --help for usage help or --verbose for debug output"
What is wrong with that? Where are jars?
A ton of Thanks!!

This question was asked long ago, so I assume you have figured out by now.
But, as I just had the same problem, I will post the solution that worked for me.
The deployment section of this guide (http://spark.apache.org/docs/latest/streaming-kafka-integration.html) says you can pass the lib with the --packages argument, like bellow:
bin/spark-submit \
--packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 \
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test
You can also download the jar itself here: http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-kafka-assembly_2.10%22
Note: I didn't ran the command above, I tested with this other example, but it should work the same way:
bin/spark-submit
--packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 \
examples/src/main/python/streaming/direct_kafka_wordcount.py \
localhost:9092 test

Related

Compiling Jaeger gRPC proto files with Python

I'm currently playing around with Jaeger Query and trying to access its content through the API, which uses gRPC. I'm not familiar with gRPC, but my understanding is that I need to use the Python gRPC compiler (grpcio_tools.protoc) on the relevant proto file to get useful Python definitions. What I'm trying to do is find out ways to access Jaeger Query by API, without the frontend UI.
Currently, I'm very stuck on compiling the proto files. Every time I try, I get dependency issues (Import "fileNameHere" was not found or has errors.). The Jaeger query.proto file contains import references to files outside the repo. Whilst I can find these and manually collect them, they also have dependencies. I get the impression that following through and collecting each of these one by one is not how this was intended to be done.
Am I doing something wrong here? The direct documentation through Jaeger is limited for this. The below is my basic terminal session, before including any manually found files (which themselves have dependencies I would have to go and find the files for).
$ python -m grpc_tools.protoc --grcp_python_out=. --python_out=. --proto_path=. query.proto
model.proto: File not found.
gogoproto/gogo.proto: File not found.
google/api/annotations.proto: File not found.
protoc-gen-swagger/options/annotations.proto: File not found.
query.proto:20:1: Import "model.proto" was not found or had errors.
query.proto:21:1: Import "gogoproto/gogo.proto" was not found or had errors.
query.proto:22:1: Import "google/api/annotations.proto" was not found or had errors.
query.proto:25:1: Import "protoc-gen-swagger/options/annotations.proto" was not found or had errors.
query.proto:61:12: "jaeger.api_v2.Span" is not defined.
query.proto:137:12: "jaeger.api_v2.DependencyLink" is not defined.
Thanks for any help.

A colleague of mine provided the answer... It was hidden in the Makefile, which hadn't worked for me as I don't use Golang (and it had been more complex than just installing Golang and running it, but I digress...).
The following .sh will do the trick. This assumes the query.proto file is a subdirectory from the same location as the script below, under model/proto/api_v2/ (as it appears in the main Jaeger repo).
#!/usr/bin/env sh
set +x
rm -rf ./js_out 2> /dev/null
mkdir ./js_out
PROTO_INCLUDES="
-I model/proto \
-I idl/proto \
-I vendor/github.com/grpc-ecosystem/grpc-gateway \
-I vendor/github.com/gogo/googleapis \
-I vendor/github.com/gogo/protobuf/protobuf \
-I vendor/github.com/gogo/protobuf"
python -m grpc_tools.protoc ${PROTO_INCLUDES} --grpc_python_out=./python_out --python_out=./python_out model/proto/api_v2/query.proto
This will definitely generate the needed Python file, but it will still be missing dependencies.

I did the following to get the Jaeger gRPC Python APIs:
git clone --recurse-submodules https://github.com/jaegertracing/jaeger-idl
cd jaeger-idl/
make proto
Use the files inside proto-gen-python/.
Note:
While importing the generated code, if you face the error:
AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key'
Do:
pip3 install --upgrade pip
pip3 install --upgrade protobuf

Is it possible to use javapackager on ZuluFX for Mac

I was able to use ZuluFX 8 with javapackager on Windows. However, on a Mac I get this error:
Bundler Mac Application Image skipped because of a configuration problem: Cannot determine which JRE/JDK exists in the specified runtime directory.
Advice to fix: Point the runtime directory to one of the JDK/JRE root, the Contents/Home directory of that root, or the Contents/Home/jre directory of the JDK.
It's pretty easy to just move the package into Contents/Home but I doubt that will work as it seems there is no JRE bundled with the Mac version of ZuluFX 8. Is this something that can be worked around?

It's pretty easy to just move the package into Contents/Home but I doubt that will work as it seems there is no JRE bundled with the Mac version of ZuluFX 8.
From what I'm seeing, I'm not sure that's correct. The ZuluFx 8 archive for Mac contains a jre directory. I extracted the archive to ~/zuluFX and from there created the Contents/Home directory as required by MacOS and added a symbolic link to said jre directory there. I then set $JAVA_HOME accordingly:
$ pwd
/Users/cody/zuluFX
$ mkdir -p Contents/Home
$ ln -s ../../jre .
$ export JAVA_HOME=~/zuluFX
Then I utilized a simple javapackager example on github to test its usage (I have no other JREs/JDKs installed on this box). The example app simply dumps Java properties and environment variables in a TextArea.
I had to modify the 3build script in the example to comment out its attempt to re-set $JAVA_HOME, but otherwise, it builds successfully, with the following javapackager command:
javapackager \
-deploy -Bruntime=${JAVA_HOME} \
-native image \
-srcdir . \
-srcfiles MacJavaPropertiesApp.jar \
-outdir release \
-outfile ${APP_DIR_NAME} \
-appclass MacJavaPropertiesApp \
-name "MacJavaProperties" \
-title "MacJavaProperties" \
-nosign \
-v
When I launch the resulting app, it reports the usage of the azul/zulu jre as expected:

How to generate some models for java with OpenApi Generator?

I successfully did generate a REST Client in java from a Swagger/OpenApi v2.0 using OpenApi Generator CLI 3.3.2-SNAPSHOT
But I already have a REST Client, so I just want to generate some models from the spec.
I get success when I run:
java -Dmodels -DmodelDocs=false \
-jar modules/openapi-generator-cli/target/openapi-generator-cli.jar generate \
-i swagger.json \
-g java \
-o /temp/my_models
But when I want to generate just specific models with
java -Dmodels=Body,Header -DmodelDocs=false \
-jar modules/openapi-generator-cli/target/openapi-generator-cli.jar generate \
-i swagger.json \
-g java
-o /temp/my_selected_models
I'm getting this ERROR:
[main] INFO o.o.c.languages.AbstractJavaCodegen - Environment
variable JAVA_POST_PROCESS_FILE not defined so the Java code may not
be properly formatted. To define it, try 'export
JAVA_POST_PROCESS_FILE="/usr/local/bin/clang-format -i"' (Linux/Mac)
What is this JAVA_POST_PROCESS_FILE and how can I specify a valid format to generate the models?
Why the code generation success with all models but fails with a subset?

That message is just informational. It aims to inform you that there's a way to auto-format the auto-generated Java code by specifying an environment variable with the auto code formatter (clang_format in this case):
export JAVA_POST_PROCESS_FILE="/usr/local/bin/clang-format -i"
In other words, it does not affect the code generation process if the environment variable is not specified.

How to set QT_QPA_PLATFORM_PLUGIN_PATH properly (concept)?

I have Qt Creator and Qt 5.5 installed.
QT_QPA_PLATFORM_PLUGIN_PATH = C:\Qt\5.5\msvc2013\plugins
If I disable the environment var, I do get an error when I launch an application from QtC. So the variable seems to be required.
My problem is:
When I run other Qt based applications (i.e. Teamspeak or such), those fail, I always have to disable (delete) QT_QPA_PLATFORM_PLUGIN_PATH first
When I use KITS in QtC and switch between Qt versions (i.e. 5.4, 5.6) the variable is not in sync with this very version
How is this supposed to work?

The best solution I have found so far is to set it on the QtC Project page for that specific build

My decision that helped me. It:
In the search for Win 10 enter sysdm.cpl
Advanced -> Environment Variables -> to System Variables -> add:
PATH
C: \ Users \ ~ \ AppData \ Local \ Programs \ Python \ Python36-32 \ Lib \ site-packages \ pyqt5_tools \ plugins \ platforms \ (your address to qminimal.dll, qoffscreen.dll, qwebgl.dll)
dll took from here: https://www.riverbankcomputing.com/software/pyqt/download5 official site

How to pass external jar through the commnadline while running MapReduce?

I use nltk within Python MapReduce program and use the below command to execute it.
I have found out that I am not able to pass nltk correctly along with the command. Could anyone let me know what is the correct syntax? Thanks.

Let me attempt to provide an answer. Please get back to me if it doesn't work for you.
May be you can try the following. Since, you are already using the -file option to pass the Mapper.py, using only ussing -mapper Mapper.py should do and try to use -libjars instead of -archives, if you need the classes inside nltk.jar in classpath.
hadoop jar /usr/lib/gphd/hadoop-mapreduce-2.0.2_alpha_gphd_2_0_1_0/hadoop-streaming-2.0.2-alpha-gphd-2.0.1.0.jar \
-libjars senti-data/nltk.jar \
-file senti-data/traintweets.csv \
-file senti-data/stopwords.txt \
-file /home/cduser/senti-data/Mapper.py \
-mapper Mapper.py \
-input senti-data/inputtweets.txt \
-output output

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex