Trouble building a simple SparkSQL application - sbt

This is a pretty noob question.
I'm trying to learn about SparkSQL. I've been following the example described here:
http://spark.apache.org/docs/1.0.0/sql-programming-guide.html
Everything works fine in the Spark-shell, but when I try to use sbt to build a batch version, I get the following error message:
object sql is not a member of package org.apache.spark
Unfortunately, I'm rather new to sbt, so I don't know how to correct this problem. I suspect that I need to include additional dependencies, but I can't figure out how.
Here is the code I'm trying to compile:
/* TestApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
case class Record(k: Int, v: String)
object TestApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val data = sc.parallelize(1 to 100000)
val records = data.map(i => new Record(i, "value = "+i))
val table = createSchemaRDD(records, Record)
println(">>> " + table.count)
}
}
The error is flagged on the line where I try to create a SQLContext.
Here is the content of the sbt file:
name := "Test Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
Thanks for the help.

As is often the case, the act of asking the question helped me figure out the answer. The answer is to add the following line in the sbt file.
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.0.0"
I also realized there is an additional problem in the little program above. There are too many arguments in the call to createSchemaRDD. That line should read as follows:
val table = createSchemaRDD(records)

Thanks! I ran into a similar problem while building a Scala app in Maven. Based on what you did with SBT, I added the corresponding Maven dependencies as follows and now I am able to compile and generate the jar file.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>1.2.1</version>
</dependency>

I got the similar issue, in my case, i just copy pasted the below sbt setup from online with scalaVersion := "2.10.4" but in my environment, i actually have the scala version 2.11.8
so updated & executed sbt package again, issue fixed
name := "Test Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

Related

When using a Scala compiler plugin in sbt, how do you set a library dependency for the plugin?

I'm using a compiler plugin I wrote that depends on the Kyro serialization library. When attempting to use my plugin I set this up in build.sbt (top-level) like this:
lazy val dependencies =
new {
val munit = "org.scalameta" %% "munit" % "0.7.12" % Test
val kyro = "com.esotericsoftware" % "kryo" % "5.0.0-RC9"
}
lazy val commonDependencies = Seq(
dependencies.kyro,
dependencies.munit
)
lazy val root = (project in file("."))
.settings(
libraryDependencies ++= commonDependencies,
Test / parallelExecution := false
)
addCompilerPlugin("co.blocke" %% "dotty-reflection" % reflectionLibVersion)
But when I compile my target project, I get a java.lang.NoClassDefFoundError that it can't find Kyro. I've added kyro to my dependencies, but since this is for the compiler, not my app, it's not picking that up.
How can I properly tell sbt about a dependency my plugin needs?

scalaFX standalone execute jar file

Good day! Help me, please. I startup this example
sbt> run
It's okey all play, after
sbt> package
Will build jar file, after double click messge:
Error: A JNI error has occured, please check your installation and try again.
Scala version: 2.12.4. JVM:1.8.0_152. ScalaFX:8.0.102-R11
hello.scala: `
package hello
import scalafx.Includes._
import scalafx.application.JFXApp
import scalafx.application.JFXApp.PrimaryStage
import scalafx.scene.Scene
import scalafx.scene.paint.Color._
import scalafx.scene.shape.Rectangle
object HelloStage extends JFXApp {
stage = new JFXApp.PrimaryStage {
title.value = "Hello Stage"
width = 600
height = 450
scene = new Scene {
fill = LightGreen
content = new Rectangle {
x = 25
y = 40
width = 100
height = 100
fill <== when(hover) choose Green otherwise Red
}
}
}
}
build.sbt:
name := "Scala"
organization := "scalafx.org"
version := "1.0.5"
scalaVersion := "2.12.4"
scalacOptions ++= Seq("-unchecked", "-deprecation", "-Xcheckinit", "-encoding", "utf8")
resourceDirectory in Compile := (scalaSource in Compile).value
libraryDependencies ++= Seq(
"org.scalafx" %% "scalafx" % "8.0.102-R11",)
addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)
fork := true
This is a Java classpath issue. When you try to execute the resulting JAR file, it cannot find the jar files that it needs to run.
Try the following:
Firstly, copy & paste the following to project/plugins.sbt:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")
This loads the sbt-assembly plugin, which will create a fat JAR file, containing all of the dependencies.
Secondly, change your build.sbt file to the following:
name := "Scala"
organization := "scalafx.org"
version := "1.0.5"
scalaVersion := "2.12.4"
scalacOptions ++= Seq("-unchecked", "-deprecation", "-Xcheckinit", "-encoding", "utf8")
libraryDependencies += "org.scalafx" %% "scalafx" % "8.0.102-R11"
fork := true
mainClass in assembly := Some("hello.HelloStage")
This simplifies what you originally had. The macro paradise compiler plugin is not required, and I also removed the slightly odd resourceDirectory setting.
To create the fat JAR, run the command:
sbt
sbt> assembly
The JAR file you're looking for is most likely located at target/scala-2.12/Scala-assembly-1.0.5.jar. You should now be good to go...
Alternatively, you can add all the necessary JAR files to your classpath. Another plugin that can help with that (you probably shouldn't use it with sbt-assembly) - is sbt-native-packager, which creates installers for you. You can then install your app and run it like a regular application.

Including project in build depending on setting's value, e.g. scalaVersion?

I have a Scala project that is divided into several subprojects:
lazy val core: Seq[ProjectReference] = Seq(common, json_scalaz7, json_scalaz)
I'd like to make the core lazy val conditional on the Scala version I'm currently using, so I tried this:
lazy val core2: Seq[ProjectReference] = scalaVersion {
case "2.11.0" => Seq(common, json_scalaz7)
case _ => Seq(common, json_scalaz7, json_scalaz)
}
Simply speaking, I'd like to exclude json_scalaz for Scala 2.11.0 (when the value of the scalaVersion setting is "2.11.0").
This however gives me the following compilation error:
[error] /home/diego/work/lift/framework/project/Build.scala:39: type mismatch;
[error] found : sbt.Project.Initialize[Seq[sbt.Project]]
[error] required: Seq[sbt.ProjectReference]
[error] lazy val core2: Seq[ProjectReference] = scalaVersion {
[error] ^
[error] one error found
Any idea how to solve this?
Update
I'm using sbt version 0.12.4
This project is the Lift project, which compiles against "2.10.0", "2.9.2", "2.9.1-1", "2.9.1" and now we are working on getting it to compile with 2.11.0. So creating a compile all task would not be practical, as it would take a really long time.
Update 2
I'm hoping there is something like this:
lazy val scala_xml = "org.scala-lang.modules" %% "scala-xml" % "1.0.1"
lazy val scala_parser = "org.scala-lang.modules" %% "scala-parser-combinators" % "1.0.1"
...
lazy val common =
coreProject("common")
.settings(description := "Common Libraties and Utilities",
libraryDependencies ++= Seq(slf4j_api, logback, slf4j_log4j12),
libraryDependencies <++= scalaVersion {
case "2.11.0" => Seq(scala_xml, scala_parser)
case _ => Seq()
}
)
but for the projects list
Note how depending on the scala version, I add the scala_xml and scala_parser_combinator libraries
You can see the complete build file here
Cross building a project
Simply speaking, I'd like to exclude json_scalaz for Scala 2.11.0
The built-in support in sbt for this is called cross building, which is described in Cross-Building a Project. Here's from the section with a bit of correction:
Define the versions of Scala to build against in the crossScalaVersions setting. For example, in a .sbt build definition:
crossScalaVersions := Seq("2.10.4", "2.11.0")
To build against all versions listed crossScalaVersions, prefix the action to run with +. For example:
> +compile
Multiple-project builds
sbt also has built-in support to aggregate tasks across multiple projects, which is described Aggregation. If what you need eventually is normal built-in tasks like compile and test, you could set up a dummy aggregate without json_scalaz.
lazy val withoutJsonScalaz = (project in file("without-json-scalaz")).
.aggregate(liftProjects filterNot {_ == json_scalaz}: _*)
From the shell, you should be able to use this as:
> ++2.11.0
> project withoutJsonScalaz
> test
Getting values from multiple scopes
Another feature you might be interested in is ScopeFilter. This has the ability to traverse multiple projects beyond usual aggregation and cross building. You would need to create a setting whose type is ScopeFilter and set it based on scalaBinaryVersion.value. With scope filters, you can do:
val coreProjects = settingKey[ScopeFilter]("my core projects")
val compileAll = taskKey[Seq[sbt.inc.Analysis]]("compile all")
coreProjects := {
(scalaBinaryVersion.value) match {
case "2.10" => ScopeFilter(inProjects(common, json_scalaz7, json_scalaz))
}
}
compileAll := compileAllTask.value
lazy val compileAllTask = Def.taskDyn {
val f = coreProjects.value
(compile in Compile) all f
}
In this case compileAll would have the same effect as +compile, but you could aggregate the result and do something interesting like sbt-unidoc.

Using sbt-aether-deploy with sbt-native-packager

Has anyone published an sbt-native-packager produced artifact (tgz in my case) using sbt-aether-deploy to a nexus repo? (I need this for the timestamped snapshots, specifically the "correct" version tag in nexus' artifact-resolution REST resource).
I can do one or the other but can't figure out how to add the packagedArtifacts in Universal to the artifacts that sbt-aether-deploy deploys to do both.
I suspect the path to pursue would be to the addArtifact() the packagedArtifacts in Universal or creating another AetherArtifact and then to override/replace the deployTask to use that AetherArtifact?
Any help much appreciated.
I am the author of the sbt-aether-deploy plugin, and I just came over this post.
import aether.AetherKeys._
crossPaths := false //needed if you want to remove the scala version from the artifact name
enablePlugins(JavaAppPackaging)
aetherArtifact := {
val artifact = aetherArtifact.value
artifact.attach((packageBin in Universal).value, "dist", "zip")
}
This will also publish the other main artifact.
If you want to disable publishing of the main artifact, then you will need to rewrite the artifact coordinates. Maven requires a main artifact.
I have added a way to replace the main artifact for this purpose, but I can now see that way is kind of flawed. It will still assume that the artifact is published as a jar file. The main artifact type is locked down to that, since the POM packaging is set to jar by default by SBT.
If this is an app, then that limitation is probably OK, since Maven will never resolve that into an artifact.
The "proper" way in Maven terms is to add a classifier to the artifact and change the "packaging" in the POM file to "pom". We will see if I get around to changing that particular part.
Ok, I think I got it amazingly enough. If there's a better way to do it I'd love to hear. Not loving that blind Option.get there..
val tgzCoordinates = SettingKey[MavenCoordinates]("the maven coordinates for the tgz")
lazy val myPackagerSettings = packageArchetype.java_application ++ deploymentSettings ++ Seq(
publish <<= publish.dependsOn(publish in Universal),
publishLocal <<= publishLocal.dependsOn(publishLocal in Universal)
)
lazy val defaultSettings = buildSettings ++ Publish.settings ++ Seq(
scalacOptions in Compile ++= Seq("-encoding", "UTF-8", "-target:jvm-1.7", "-deprecation", "-feature", "-unchecked", "-Xlog-reflective-calls"),
testOptions in Test += Tests.Argument("-oDF")
)
lazy val myAetherSettings = aetherSettings ++ aetherPublishBothSettings
lazy val toastyphoenixProject = Project(
id = "toastyphoenix",
base = file("."),
settings = defaultSettings ++ myPackagerSettings ++ myAetherSettings ++ Seq(
name in Universal := name.value + "_" + scalaBinaryVersion.value,
packagedArtifacts in Universal ~= { _.filterNot { case (artifact, file) => artifact.`type`.contains("zip")}},
libraryDependencies ++= Dependencies.phoenix,
tgzCoordinates := MavenCoordinates(organization.value + ":" + (name in Universal).value + ":tgz:" + version.value).get,
aetherArtifact <<= (tgzCoordinates, packageZipTarball in Universal, makePom in Compile, packagedArtifacts in Universal) map {
(coords: MavenCoordinates, mainArtifact: File, pom: File, artifacts: Map[Artifact, File]) =>
createArtifact(artifacts, pom, coords, mainArtifact)
}
)
)
I took Peter's solution and reworked it slightly, avoiding the naked Option.get by creating the MavenCoordinates directly:
import aether.MavenCoordinates
import aether.Aether.createArtifact
name := "mrb-test"
organization := "me.mbarton"
version := "1.0"
crossPaths := false
packageArchetype.java_application
publish <<= (publish) dependsOn (publish in Universal)
publishLocal <<= (publishLocal) dependsOn (publishLocal in Universal)
aetherPublishBothSettings
aetherArtifact <<= (organization, name in Universal, version, packageBin in Universal, makePom in Compile, packagedArtifacts in Universal) map {
(organization, name, version, binary, pom, artifacts) =>
val nameWithoutVersion = name.replace(s"-$version", "")
createArtifact(artifacts, pom, MavenCoordinates(organization, nameWithoutVersion, version, None, "zip"), binary)
}
The nameWithoutVersion replace works around SBT native packager including the version in the artifact name:
Before: me/mbarton/mrb-test-1.0/1.0/mrb-test-1.0.zip
After: me/mbarton/mrb-test/1.0/mrb-test-1.0.zip
crossPaths avoids the Scala postfix on the version.

How to include additional dependencies when packaging an sbt app using sbt-native-packager

I am trying to include the breeze-natives dependency only when packaging the app (universal:packageBin and debian:packageBin) while always including the breeze dependency. Here is what I came up with :
val breezeDependencySettings = {
val breezeUniversalNativesDependency = libraryDependencies in Universal += D.breezeNatives
val breezeDebianNativesDependency = libraryDependencies in Debian += D.breezeNatives
val breezeDependency = libraryDependencies += D.breeze
Seq(breezeUniversalNativesDependency, breezeDebianNativesDependency, breezeDependency)
}
And in the project that I want to package, I use
settings = (mySettings) ++ SbtNativePackager.packageArchetype.java_server ++
Dependencies.breezeDependencySettings
However, the breeze-natives dependency is not included in the final package created by universal:packageBin. (breeze is included correctly though)
What am I doing wrong?
Not 100% clear on your requirement but have you tried ExportJars := true?
See the excerpt from my build in my question here: https://stackoverflow.com/questions/23035100/how-to-remove-version-from-artifactid-generated-by-sbt-native-packager

Resources