KafkaUtils class not found in Spark streaming

KafkaUtils class not found in Spark streaming - sbt

I have just began with Spark Streaming and I am trying to build a sample application that counts words from a Kafka stream. Although it compiles with sbt package, when I run it, I get NoClassDefFoundError. This post seems to have the same problem, but the solution is for Maven and I have not been able to reproduce it with sbt.
KafkaApp.scala:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
object KafkaApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("kafkaApp").setMaster("local[*]")
val ssc = new StreamingContext(conf, Seconds(1))
val kafkaParams = Map(
"zookeeper.connect" -> "localhost:2181",
"zookeeper.connection.timeout.ms" -> "10000",
"group.id" -> "sparkGroup"
)
val topics = Map(
"test" -> 1
)
// stream of (topic, ImpressionLog)
val messages = KafkaUtils.createStream(ssc, kafkaParams, topics, storage.StorageLevel.MEMORY_AND_DISK)
println(s"Number of words: %{messages.count()}")
}
}
build.sbt:
name := "Simple Project"
version := "1.1"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.1.1",
"org.apache.spark" %% "spark-streaming" % "1.1.1",
"org.apache.spark" %% "spark-streaming-kafka" % "1.1.1"
)
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
And I submit it with:
bin/spark-submit \
--class "KafkaApp" \
--master local[4] \
target/scala-2.10/simple-project_2.10-1.1.jar
Error:
14/12/30 19:44:57 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#192.168.5.252:65077/user/HeartbeatReceiver
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$
at KafkaApp$.main(KafkaApp.scala:28)
at KafkaApp.main(KafkaApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtils$
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

spark-submit does not automatically put the package containing KafkaUtils. You need to have in your project JAR. For that you need to create an all inclusive uber-jar, using sbt assembly. Here is an example build.sbt .
https://github.com/tdas/spark-streaming-external-projects/blob/master/kafka/build.sbt
You obviously also need to add the assembly plugin to SBT.
https://github.com/tdas/spark-streaming-external-projects/tree/master/kafka/project

Please try by including all dependency jars while submitting application:
./spark-submit --name "SampleApp" --deploy-mode client--master spark://host:7077 --class com.stackexchange.SampleApp --jars $SPARK_INSTALL_DIR/spark-streaming-kafka_2.10-1.3.0.jar,$KAFKA_INSTALL_DIR/libs/kafka_2.10-0.8.2.0.jar,$KAFKA_INSTALL_DIR/libs/metrics-core-2.2.0.jar,$KAFKA_INSTALL_DIR/libs/zkclient-0.3.jar spark-example-1.0-SNAPSHOT.jar

Following build.sbt worked for me. It requires you to also put the sbt-assembly plugin in a file under the projects/ directory.
build.sbt
name := "NetworkStreaming" // https://github.com/sbt/sbt-assembly/blob/master/Migration.md#upgrading-with-bare-buildsbt
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-streaming_2.10" % "1.4.1",
"org.apache.spark" % "spark-streaming-kafka_2.10" % "1.4.1", // kafka
"org.apache.hbase" % "hbase" % "0.92.1",
"org.apache.hadoop" % "hadoop-core" % "1.0.2",
"org.apache.spark" % "spark-mllib_2.10" % "1.3.0"
)
mergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
project/plugins.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")

meet the same problem, I solved it by build the jar with dependencies.
add the code below to pom.xml
<build>
<sourceDirectory>src/main/java</sourceDirectory>
<testSourceDirectory>src/test/java</testSourceDirectory>
<plugins>
<!--
Bind the maven-assembly-plugin to the package phase
this will create a jar file without the storm dependencies
suitable for deployment to a cluster.
-->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
mvn package
submit the "example-jar-with-dependencies.jar"

Added the Dependency externally, project-->properties--> java Build Path-->Libraries--> add External jars and add the required jar.
this solved my issue.

Using Spark 1.6 do the job for me without the hassle of handling so many external jars... Can get quite complicate to manage...

You could also download the jar file and put it in the Spark lib folder, because it is not installed with Spark, instead of beating your head trying to bet SBT build.sbt to work.
http://central.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10_2.10/2.1.1/spark-streaming-kafka-0-10_2.10-2.1.1.jar
copy it to:
/usr/local/spark/spark-2.1.0-bin-hadoop2.6/jars/

use --packages argument on spark-submit, it takes mvn package in the format group:artifact:version,...

import org.apache.spark.streaming.kafka.KafkaUtils
use the below in build.sbt
name := "kafka"
version := "0.1"
scalaVersion := "2.11.12"
retrieveManaged := true
fork := true
//libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.2.0"
//libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-8_2.11" % "2.1.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
//libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-8" % "2.2.0" % "provided"
// https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8-assembly
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-8-assembly" % "2.2.0"
This will fix the issue

Related

Where to put "enablePlugins" in SBT?

Following this tutorial, I am asked to add enablePlugins(WindowsPlugin) to my SBT configuration.
I did this by stating exactly this line in my build.sbt but all I get is "Cannot resolve symbol". Do I need to add the dependency somewhere?
Is this an auto plugin and can anyone explain to me what an auto plugin actually is and how I use it?
UPDATE: My build.sbt looks like that:
name := "ApplicationName"
version := "0.3-SNAPSHOT"
scalaVersion := "2.13.1"
enablePlugins(WindowsPlugin)
mainClass in assembly := Some("application.ConfigEditorApplication")
assemblyJarName in assembly := s"application-$version.jar"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs#_*) => MergeStrategy.discard
case PathList("reference.conf") => MergeStrategy.concat
case x => MergeStrategy.first
}
libraryDependencies += "org.apache.commons" % "commons-lang3" % "3.9"
libraryDependencies += "commons-io" % "commons-io" % "2.6"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3"
libraryDependencies += "com.typesafe.scala-logging" % "scala-logging_2.13" % "3.9.2"
libraryDependencies += "com.typesafe.akka" %% "akka-actor-typed" % "2.6.3"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.1.1" % "test"
libraryDependencies += "org.scalamock" %% "scalamock" % "4.4.0" % Test
libraryDependencies += "org.mockito" % "mockito-scala_2.13" % "1.11.3"
libraryDependencies += "org.mockito" % "mockito-scala-scalatest_2.13" % "1.11.3"

I found the solution to my problem: From the beginning, I suspected, that the plugin needs to be added before it can be enabled. Unfortunately, nothing of that sort was mentioned in the tutorial I was following.
The plugin which has to be added is the native-packager plugin: addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.7.0").

You should create your auto pluggin in your build.sbt. The build.sbt file must be at the root of your projet, at the same level with the file src.
You have information about it here and here.
In the page you mentioned they say you should set this in your build.sbt. Try this.
// general package information (can be scoped to Windows)
maintainer := "Josh Suereth <joshua.suereth#typesafe.com>"
packageSummary := "test-windows"
packageDescription := """Test Windows MSI."""
// wix build information
wixProductId := "ce07be71-510d-414a-92d4-dff47631848a"
wixProductUpgradeId := "4552fb0e-e257-4dbd-9ecb-dba9dbacf424"
UPDATE
Also, I found this question which is related to yours. It is true, it is an old one, but it might give you some hints. Some answers suggest performing updatings, others deleting and then reimporting the project.

Using DL4J in Scala and no backend found

I'm writing a simple scala project for dl4j. I need to switch between cuda (for training) and native for production. I seem to have a problem using native in an assembled jar. I get the below error:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.datavec.api.util.ndarray.RecordConverter.toMinibatchArray(RecordConverter.java:197)
at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.next(RecordReaderMultiDataSetIterator.java:159)
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:364)
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:439)
at simplecuda.main$.delayedEndpoint$simplecuda$main$1(main.scala:37)
at simplecuda.main$delayedInit$body.apply(main.scala:27)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at simplecuda.main$.main(main.scala:27)
at simplecuda.main.main(main.scala)
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5449)
at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:213)
... 15 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:213)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5446)
... 16 more
My build file is:
name := "simplecuda"
version := "1.0"
scalaVersion := "2.11.8"
// looks like you need to remove ~/.ivy2/cache and ~/.javacpp/cache whenever you switch between platforms
classpathTypes += "maven-plugin"
libraryDependencies ++= Seq(
"org.scalactic" %% "scalactic" % "3.0.5",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
// "org.nd4j" % "nd4j-cuda-9.2-platform" % "1.0.0-beta2",
// "org.deeplearning4j" % "deeplearning4j-cuda-9.2" % "1.0.0-beta2"
"org.nd4j" % "nd4j-native-platform" % "1.0.0-beta2",
"org.deeplearning4j" % "deeplearning4j-core" % "1.0.0-beta2"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
When I visit http://nd4j.org/getstarted.html to learn about the noAvailableBackendException I see that build.sbt should have the below line:
classpathTypes += "maven-plugin"
I've included this in the above build.sbt and without any luck. After looking at the gradle instructions I tried adding the "org.bytedeco.javacpp-presets" % "openblas" % "0.2.20-1.3" classifier "linux-x86_64" dependency and this also did not help.
I've tried removing ~/.javacpp/cache and ~/.ivy2/cache multiple times without any luck. The repo with this example is at https://github.com/tomlue/dl4j_scala_troubleshoot

Excluding a dependency in creating fat jar using SBT

I am writing a akka application. While creating far jar of application , I dont want scala libraries to be packaged with the jar. My build.sbt looks as follows:
lazy val root = (project in file(".")).
settings(
name :="akka-app",
version :="1.0",
scalaVersion :="2.10.4",
mainClass in Compile := Some("sample.hello.HelloWorld")
)
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-actor" % "2.3.4" % "provided",
"com.typesafe" % "config" % "1.2.1"
)
// META-INF discarding
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
But this sbt packages scala with jar. I want only com.typesafe.config library to be present in the jar. Any solution how to achieve this?

You can exclude Scala by modifying the option in the assemblyOption setting:
assemblyOption in assembly :=
(assemblyOption in assembly).value.copy(includeScala = false)

How can I add process parameters using sbt-native-packager?

How can I add process parameters using sbt-native-packager configuration? I want to add the options for redirect process stderr to file? To have the result like that:
sudo -u app bash -c "app >>/var/log/app/stderr.log 2>&1"
I use sbt-native-packager 1.2.0-M5 for build deb package with JavaServerAppPackaging, JDebPackaging, SystemdPlugin, UpstartPlugin the exception in logs, only in stderr. Also I must delete app pid manually after crash and if it exists, then I have error in stderr.
My plugins.sbt:
resolvers += Resolver.bintrayRepo("sbt", "sbt-plugin-releases")
// The Play plugin
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.5.8-netty-4.1")
addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.2.0-M5")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.8.2")
addSbtPlugin("com.lightbend.sbt" % "sbt-javaagent" % "0.1.1")
libraryDependencies += "org.vafer" % "jdeb" % "1.3" artifacts (Artifact("jdeb", "jar", "jar"))
my build.sbt:
...
debianPackageDependencies in Debian ++= Seq("postgresql-9.5 (>= 9.5.1)")
lazy val root = (project in file(".")).enablePlugins(PlayScala, JavaAgent)
scalaVersion := "2.11.8"
val akkaVersion = "2.4.10"
libraryDependencies ++= Seq(
"org.postgresql" % "postgresql" % "9.4.1208",
"org.scalikejdbc" %% "scalikejdbc" % "2.4.0",
"org.scalikejdbc" %% "scalikejdbc-config" % "2.4.0",
"org.scalikejdbc" %% "scalikejdbc-play-initializer" % "2.5.1",
"org.flywaydb" %% "flyway-play" % "3.0.1",
"com.typesafe.akka" %% "akka-contrib" % akkaVersion,
"com.typesafe.akka" %% "akka-slf4j" % akkaVersion,
"io.dropwizard.metrics" % "metrics-core" % "3.1.2",
"io.dropwizard.metrics" % "metrics-jvm" % "3.1.2",
"org.coursera" % "dropwizard-metrics-datadog" % "1.1.4",
"com.typesafe.akka" %% "akka-testkit" % akkaVersion % Test,
"com.relayrides" % "pushy" % "0.8",
"com.relayrides" % "pushy-dropwizard-metrics-listener" % "0.8",
"org.eclipse.jetty.alpn" % "alpn-api" % "1.1.3.v20160715" % "runtime",
ws,
specs2 % Test
)
resolvers += "Typesafe Releases" at "http://repo.typesafe.com/typesafe/maven-releases/"
resolvers += Resolver.mavenLocal
routesGenerator := InjectedRoutesGenerator
javaOptions in Test ++= Seq("-Dlogger.resource=logback-test.xml")
scalacOptions in Universal ++= Seq("-unchecked", "-deprecation", "-notailcalls")
javaOptions in Universal ++= Seq(
"-J-server",
...
)
...
import com.typesafe.sbt.packager.archetypes.systemloader._
// UpstartPlugin for ubuntu 14.04, SystemdPlugin for ubuntu 16.04
enablePlugins(JavaServerAppPackaging, JDebPackaging, SystemdPlugin, UpstartPlugin)
requiredStartFacilities := Some("datadog-agent.service, systemd-journald.service, postgresql.service")
javaAgents += "org.mortbay.jetty.alpn" % "jetty-alpn-agent" % "2.0.4" % "dist"
ps I found a workaround, in ubuntu 16.04 I can use journald to collect all the logs in the system.

Thanks for updating the question with all relevant information. There are a couple of things here.
Only one Systemloader plugin
You enable SystemdPlugin and UpstartPlugin. If it works, it only works by accident. No version of native-packager was designed to support multiple systemloader for a single package type in a single build module.
The solution is to create sub modules with the relevant systemloader enabled.
Logging to stderr
You are right regarding systemd. It provides facilities to capture the log output of your process. If you like you can add your findings to the native-packager documentation ( there is a systemd plugin section ).
The upstart support in native-packager is rather simple. There weren't a lot of requeset as Ubuntu is switching to systemd and you can always fallback to systemv. Which brings me to the solution to your problem.
You can use the SystemVPlugin, which supports a daemon_log_file. The systemv documentation provides you with the necessary details.
cheers,
Muki

SBT one-jar Plugin Usage

I'm using the one-jar plugin to generate a fat jar file. Here is how my Build.scala looks like:
import com.github.retronym.SbtOneJar
import sbt._
import Keys._
object build extends Build {
def standardSettings = Seq(
exportJars := true
) ++ Defaults.defaultSettings
lazy val metricsProducer = Project("metricsProducer",
file("beta"),
settings = standardSettings ++ SbtOneJar.oneJarSettings
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.0",
"org.apache.kafka" %% "kafka" % "0.9.0.0"%,
"joda-time" % "joda-time" % "2.7" %,
"io.spray" %% "spray-json" % "1.3.2" %
)
}
When I tried to run this using:
sbt run one-jar
unresolved dependency: org.scala-sbt.plugins#sbt-onejar;0.8: not found
I have the dependency plug in added in the plugins.sbt. Any clues?

Not sure on sbt one-jar if it is still supported. I managed to get this working using the sbt assembly plugin.
https://github.com/sbt/sbt-assembly

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

KafkaUtils class not found in Spark streaming - sbt

Added the Dependency externally, project-->properties--> java Build Path-->Libraries--> add External jars and add the required jar. this solved my issue.

Using Spark 1.6 do the job for me without the hassle of handling so many external jars... Can get quite complicate to manage...

use --packages argument on spark-submit, it takes mvn package in the format group:artifact:version,...

Related

Where to put "enablePlugins" in SBT?

Using DL4J in Scala and no backend found

Excluding a dependency in creating fat jar using SBT

How can I add process parameters using sbt-native-packager?

SBT one-jar Plugin Usage

Categories

Resources