Need a pathlist pattern to make sbt happy - sbt

So with one dependency I've entered a level of entanglement that I can't escape. I hate to think what will happen when I bring in the commented jars:
libraryDependencies ++= Seq(
// "org.apache.avro" % "avro" % "1.8.1" excludeAll ExclusionRule(organization = "log4j"),
// "org.apache.kafka" %% "kafka" % "0.10.0.0",
"org.apache.hive" % "hive-jdbc" % "1.2.2"
excludeAll ExclusionRule(organization = "log4j")
exclude("org.apache.hadoop", "hadoop-yarn-api"),
"log4j" % "log4j" % "1.2.16"
)
Using sbt assembly, I am getting the following deduplicate problem:
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] C:\Users\G517329\.ivy2\cache\org.datanucleus\datanucleus-api-jdo\jars\datanucleus-api-jdo-3.2.6.jar:plugin.xml
[error] C:\Users\G517329\.ivy2\cache\org.datanucleus\datanucleus-core\jars\datanucleus-core-3.2.10.jar:plugin.xml
[error] C:\Users\G517329\.ivy2\cache\org.datanucleus\datanucleus-rdbms\jars\datanucleus-rdbms-3.2.9.jar:plugin.xml
Where I'm stuck is trying to find a merge strategy that allows these three jars to happily coexist in one fat jar. I've tried several variations of the strategy below, but am making no progress:
assemblyMergeStrategy in assembly := {
case PathList("javax", "transaction", xs # _*) => MergeStrategy.first
case PathList(xs # _*) if xs.last endsWith "plugin.xml" => MergeStrategy.discard
// case PathList("org", "datanucleus", "datanucleus-api-jdo", xs # _*) => MergeStrategy.last
// case PathList("org", "datanucleus", "datanucleus-rdbms", xs # _*) => MergeStrategy.last
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
I must have some misunderstanding of how this works because it seems to me that the second line should find every plugin.xml file in every jar and nuke it.
Has anyone successfully included hive-jdbc in a fat jar?
UPDATE:
case "plugin.xml" => MergeStrategy.discard //or .last should work, I would think, but that throws:
[error] (*:assembly) java.util.NoSuchElementException

So I load up the project this morning and give it one more try. This time
case "plugin.xml" => MergeStrategy.last
does not throw, and I have a fat jar.
Guh.
UPDATE:
I don't know where the problem lies (Intellij, SBT Console, sbt-assembly), but there is some caching going on that makes it really difficult to troubleshoot merge issues. I'm finding that the only reliable way to ensure that changes I make to the code in build.sbt are actually applied is to shut down the entire IDE and re-open it.

Related

Using DL4J in Scala and no backend found

I'm writing a simple scala project for dl4j. I need to switch between cuda (for training) and native for production. I seem to have a problem using native in an assembled jar. I get the below error:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.datavec.api.util.ndarray.RecordConverter.toMinibatchArray(RecordConverter.java:197)
at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.next(RecordReaderMultiDataSetIterator.java:159)
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:364)
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:439)
at simplecuda.main$.delayedEndpoint$simplecuda$main$1(main.scala:37)
at simplecuda.main$delayedInit$body.apply(main.scala:27)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at simplecuda.main$.main(main.scala:27)
at simplecuda.main.main(main.scala)
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5449)
at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:213)
... 15 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:213)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5446)
... 16 more
My build file is:
name := "simplecuda"
version := "1.0"
scalaVersion := "2.11.8"
// looks like you need to remove ~/.ivy2/cache and ~/.javacpp/cache whenever you switch between platforms
classpathTypes += "maven-plugin"
libraryDependencies ++= Seq(
"org.scalactic" %% "scalactic" % "3.0.5",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
// "org.nd4j" % "nd4j-cuda-9.2-platform" % "1.0.0-beta2",
// "org.deeplearning4j" % "deeplearning4j-cuda-9.2" % "1.0.0-beta2"
"org.nd4j" % "nd4j-native-platform" % "1.0.0-beta2",
"org.deeplearning4j" % "deeplearning4j-core" % "1.0.0-beta2"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
When I visit http://nd4j.org/getstarted.html to learn about the noAvailableBackendException I see that build.sbt should have the below line:
classpathTypes += "maven-plugin"
I've included this in the above build.sbt and without any luck. After looking at the gradle instructions I tried adding the "org.bytedeco.javacpp-presets" % "openblas" % "0.2.20-1.3" classifier "linux-x86_64" dependency and this also did not help.
I've tried removing ~/.javacpp/cache and ~/.ivy2/cache multiple times without any luck. The repo with this example is at https://github.com/tomlue/dl4j_scala_troubleshoot

Variable dependent tasks in SBT

I can make a Makefile that has a target that processes all sources in the directory.
SOURCE_DIR := src
TARGET_DIR := target
SOURCES := $(wildcard $(SOURCE_DIR)/*)
$(TARGET_DIR)/%: $(SOURCE_DIR)/%
md5sum $^ > $#
all: $(SOURCES:$(SOURCE_DIR)/%=$(TARGET_DIR)/%)
A nice advantage here is that each file is a separate target, so they can be processed incrementally, and concurrently. The concurrent part is important in this situation.
I am trying to something similar with SBT, but am finding it surprisingly difficult. The SBT analog of a Make target sees to be a task, so I try creating one task that aggregate a variable number of smaller tasks.
import org.apache.commons.codec.digest.DigestUtils
all <<= Def.task().dependsOn({
file(sourceDir.value).listFiles.map { source =>
val target = rebase(sourceDir.value, targetDir.value)(f)
Def.task {
IO.write(target, DigestUtils.md5Hex(IO.readBytes(source)))
}
}
}: _*)
I get the error
`value` can only be used within a task or setting macro, such as :=, +=, ++=,
Def.task, or Def.setting
How can I make a proper SBT build file that resembles my Makefile, with a dynamic number of concurrent targets/tasks?
I needed flatMap.
all <<= (sourceDir, targetDir).flatMap { (sourceDir, targetDir) =>
task{}.dependsOn({
file(sourceDir).listFiles.map { source =>
task {
val target = rebase(sourceDir, targetDir)(f)
IO.write(target, DigestUtils.md5Hex(IO.readBytes(source)))
}
}
}: _*)
}
There might be a slicker way to do task{}.dependsOn(...: _*), but I don't know what it is.

Why is the error "Not a valid command: assembly-merge-strategy"?

I have the following build.sbt file.
import AssemblyKeys._
name := "approxstrmatch"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies+="org.apache.spark" %% "spark-core" % "1.0.0"
resolvers += "AkkaRepository" at "http://repo.akka.io/releases/"
// My merge strategy is specified here.
lazy val app = Project("approxstrmatch", file("approxstrmatch"),
settings = buildSettings ++ assemblySettings ++ Seq(
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "servlet", xs # _*) => MergeStrategy.first
case PathList("javax", "transaction", xs # _*) => MergeStrategy.first
case PathList("javax", "mail", xs # _*) => MergeStrategy.first
case PathList("javax", "activation", xs # _*) => MergeStrategy.first
case PathList(ps # _*) if ps.last endsWith ".html" => MergeStrategy.first
case "application.conf" => MergeStrategy.concat
case "unwanted.txt" => MergeStrategy.discard
case x => old(x)
}
})
)
mainClass in assembly := Some("approxstrmatch.JaccardScore")
// jarName in assembly := "approstrmatch.jar"
When I execute the following command sbt assembly-merge-strategy there's an error I don't understand. Any help appreciated.
approxstrmatch]$ sbt assembly-merge-strategy
[info] Loading project definition from /apps/sameert/software/approxstrmatch/project
[info] Set current project to approxstrmatch (in buildfile:/apps/sameert/software/approxstrmatch/)
[error] Not a valid command: assembly-merge-strategy
[error] No such setting/task
My understanding tells me there's no assembly-merge-strategy task in sbt-assembly plugin (I can only suspect you use that plugin in your build).
Execute assembly as described in https://github.com/sbt/sbt-assembly#assembly-task as "an awesome new assembly task which will compile your project, run your tests, and then pack your class files and all your dependencies into a single JAR file".
There is a setting named assemblyMergeStrategy (aka assembly-merge-strategy). It's just that you won't directly use it. The way sbt-assembly uses it is scoped to assembly task:
mergeStrategy in assembly <<= ....
So here's what you have to do to call it from the shell:
$ sbt assembly::assemblyMergeStrategy
[info] blabla other things...
[info] <function1>
add assemblySettings in your build.sbt will help

Including project in build depending on setting's value, e.g. scalaVersion?

I have a Scala project that is divided into several subprojects:
lazy val core: Seq[ProjectReference] = Seq(common, json_scalaz7, json_scalaz)
I'd like to make the core lazy val conditional on the Scala version I'm currently using, so I tried this:
lazy val core2: Seq[ProjectReference] = scalaVersion {
case "2.11.0" => Seq(common, json_scalaz7)
case _ => Seq(common, json_scalaz7, json_scalaz)
}
Simply speaking, I'd like to exclude json_scalaz for Scala 2.11.0 (when the value of the scalaVersion setting is "2.11.0").
This however gives me the following compilation error:
[error] /home/diego/work/lift/framework/project/Build.scala:39: type mismatch;
[error] found : sbt.Project.Initialize[Seq[sbt.Project]]
[error] required: Seq[sbt.ProjectReference]
[error] lazy val core2: Seq[ProjectReference] = scalaVersion {
[error] ^
[error] one error found
Any idea how to solve this?
Update
I'm using sbt version 0.12.4
This project is the Lift project, which compiles against "2.10.0", "2.9.2", "2.9.1-1", "2.9.1" and now we are working on getting it to compile with 2.11.0. So creating a compile all task would not be practical, as it would take a really long time.
Update 2
I'm hoping there is something like this:
lazy val scala_xml = "org.scala-lang.modules" %% "scala-xml" % "1.0.1"
lazy val scala_parser = "org.scala-lang.modules" %% "scala-parser-combinators" % "1.0.1"
...
lazy val common =
coreProject("common")
.settings(description := "Common Libraties and Utilities",
libraryDependencies ++= Seq(slf4j_api, logback, slf4j_log4j12),
libraryDependencies <++= scalaVersion {
case "2.11.0" => Seq(scala_xml, scala_parser)
case _ => Seq()
}
)
but for the projects list
Note how depending on the scala version, I add the scala_xml and scala_parser_combinator libraries
You can see the complete build file here
Cross building a project
Simply speaking, I'd like to exclude json_scalaz for Scala 2.11.0
The built-in support in sbt for this is called cross building, which is described in Cross-Building a Project. Here's from the section with a bit of correction:
Define the versions of Scala to build against in the crossScalaVersions setting. For example, in a .sbt build definition:
crossScalaVersions := Seq("2.10.4", "2.11.0")
To build against all versions listed crossScalaVersions, prefix the action to run with +. For example:
> +compile
Multiple-project builds
sbt also has built-in support to aggregate tasks across multiple projects, which is described Aggregation. If what you need eventually is normal built-in tasks like compile and test, you could set up a dummy aggregate without json_scalaz.
lazy val withoutJsonScalaz = (project in file("without-json-scalaz")).
.aggregate(liftProjects filterNot {_ == json_scalaz}: _*)
From the shell, you should be able to use this as:
> ++2.11.0
> project withoutJsonScalaz
> test
Getting values from multiple scopes
Another feature you might be interested in is ScopeFilter. This has the ability to traverse multiple projects beyond usual aggregation and cross building. You would need to create a setting whose type is ScopeFilter and set it based on scalaBinaryVersion.value. With scope filters, you can do:
val coreProjects = settingKey[ScopeFilter]("my core projects")
val compileAll = taskKey[Seq[sbt.inc.Analysis]]("compile all")
coreProjects := {
(scalaBinaryVersion.value) match {
case "2.10" => ScopeFilter(inProjects(common, json_scalaz7, json_scalaz))
}
}
compileAll := compileAllTask.value
lazy val compileAllTask = Def.taskDyn {
val f = coreProjects.value
(compile in Compile) all f
}
In this case compileAll would have the same effect as +compile, but you could aggregate the result and do something interesting like sbt-unidoc.

Merge Strategy not behaving as expected

In build.scala I have the following:
mergeStrategy <<= (mergeStrategy in assembly) {(old) => {
case PathList("javax", "servlet", "resources", xs # _*) => MergeStrategy.first
case x => old(x)
}}
However when I run assembly I see:
[info] Merging 'javax/servlet/resources/web-app_2_2.dtd' with strategy 'deduplicate'
showing that it is using the "deduplicate" strategy, not the "first" strategy. This gives the following error:
[error] {file:/home/dan/tesla/}tesla-appengine/*:assembly: deduplicate: different file contents found in the following:
[error] /home/dan/.ivy2/cache/com.google.appengine/appengine-tools-sdk/jars/appengine-tools-sdk-1.7.3.jar:javax/servlet/resources/web-app_2_2.dtd
[error] /home/dan/.ivy2/cache/javax.servlet/servlet-api/jars/servlet-api-2.5.jar:javax/servlet/resources/web-app_2_2.dtd
I had not applied this setting in the proper scope. I needed to set
mergeStrategy in assembly <<= ...

Resources