OutofMemoryErrory creating fat jar with sbt assembly - jar

We are trying to make a fat jar file containing one small scala source file and a ton of dependencies (simple mapreduce example using spark and cassandra):
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import org.apache.spark.SparkConf
object VMProcessProject {
def main(args: Array[String]) {
val conf = new SparkConf()
.set("spark.cassandra.connection.host", "127.0.0.1")
.set("spark.executor.extraClassPath", "C:\\Users\\SNCUser\\dataquest\\ScalaProjects\\lib\\spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar")
println("got config")
val sc = new SparkContext("spark://US-L15-0027:7077", "test", conf)
println("Got spark context")
val rdd = sc.cassandraTable("test_ks", "test_col")
println("Got RDDs")
println(rdd.count())
val newRDD = rdd.map(x => 1)
val count1 = newRDD.reduce((x, y) => x + y)
}
}
We do not have a build.sbt file, instead putting jars into a lib folder and source files in the src/main/scala directory and running with sbt run. Our assembly.sbt file looks as follows:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
When we run sbt assembly we get the following error message:
...
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: java heap space
at java.util.concurrent...
We're not sure how to change the jvm settings to increase the memory since we are using sbt assembly to make the jar. Also, if there is something egregiously wrong with how we are writing the code or building our project that'd help us out a lot too; there's been so many headaches trying to set up a basic spark program!

sbt is essentially a java process. You can try to tune your sbt runtime heap size for the OutOfMemory issues.
For 0.13.x, the default memory options sbt uses is
-Xms1024m -Xmx1024m -XX:ReservedCodeCacheSize=128m -XX:MaxPermSize=256m.
And you can enlarge the heap size by doing something like
sbt -J-Xms2048m -J-Xmx2048m assembly

I was including spark as an unmanaged dependency (putting the jar file in the lib folder) which used a lot of memory because it is a huge jar.
Instead, I made a build.sbt file which included spark as a provided, unmanaged dependency.
Secondly, I created the environment variable JAVA_OPTS with the value -Xms256m -Xmx4g, which sets the minimum heap size to 256 megabytes, while allowing the heap to grow to a maximum size of 4 gigabytes. These two combined allowed me to create a jar file with sbt assembly
More info on provided dependencies:
https://github.com/sbt/sbt-assembly

I met the issue before. For my env, set Java_ops doesn't work.
I use below command and it works.
set SBT_OPTS="-Xmx4G"
sbt assembly
There is no issue of out of memeory.

this works for me:
sbt -mem 2000 "set test in assembly := {}" assembly

Related

Execute sbt task without prior compiling (for generating database classes with JOOQ)

I have a PlayFramework 2.7 application which is build by sbt.
For accessing the database, I am using JOOQ. As you know, JOOQ reads my database schema and generates the Java source classes, which then are used in my application.
The application only compiles, if the database classes are present.
I am generating the classes with this custom sbt task:
// https://mvnrepository.com/artifact/org.jooq/jooq-meta
libraryDependencies += "org.jooq" % "jooq-meta" % "3.12.1"
lazy val generateJOOQ = taskKey[Seq[File]]("Generate JooQ classes")
generateJOOQ := {
(runner in Compile).value.run("org.jooq.codegen.GenerationTool",
(fullClasspath in Compile).value.files,
Array("conf/db.conf.xml"),
streams.value.log).failed foreach (sys error _.getMessage)
((sourceManaged.value / "main/generated") ** "*.java").get
}
I googled around and found the script above and modified it a little bit according to my needs, but I do not really understand it, since sbt/scala are new to me.
The problem now is, when I start the generateJOOQ, sbt tries to compile the project first, which fails, because the database classes are missing. So what I have to do is, comment all code out which uses the generated classes, execute the task which compiles my project, generates the database classes and then enable the commented out code again. This is a pain!
I guess the problem is the command (runner in Compile) but I did not find any possibility to execute my custom task WITHOUT compiling first.
Please help! Thank you!
Usually, when you want to generate sources, you should use a source generator, see https://www.scala-sbt.org/1.x/docs/Howto-Generating-Files.html
sourceGenerators in Compile += generateJOOQ
Doing that automatically causes SBT to execute your task first before trying to compile the Scala/Java sources.
Then, you should not really use the runner task, since that is used to run your project, which depends on the compile task, which needs to execute first of course.
You should add the jooq-meta library as a dependeny of the build, not of your sources. That means you should add the libraryDependencies += "org.jooq" % "jooq-meta" % "3.12.1" line e.g. to project/jooq.sbt.
Then, you can simply call the GenerationTool of jooq just as usually in your task:
// build.sbt
generateJOOQ := {
org.jooq.codegen.GenerationTool.main(Array("conf/db.conf.xml"))
((sourceManaged.value / "main/generated") ** "*.java").get
}

sbt aspectj with native packager

I'm attempting to use the sbt-aspectj plugin with the sbt native packager and am running into an issue where the associated -javaagent path to the aspectj load time weaver jar references an ivy cache location rather than something packaged.
That is, after running sbt stage, executing the staged application via bash -x target/universal/stage/bin/myapp/ results in this javaagent:
exec java -javaagent:/home/myuser/.ivy2/cache/org.aspectj/aspectjweaver/jars/aspectjweaver-1.8.10.jar -cp /home/myuser/myproject/target/universal/stage/lib/org.aspectj.aspectjweaver-1.8.10.jar:/home/myuser/myproject/target/universal/stage/lib/otherlibs.jar myorg.MyMainApp args
My target platform is Heroku where the artifacts are built before being effectively 'pushed' out to individual 'dynos' (very analogous to a docker setup). The issue here is that the resulting -javaagent path was valid on the machine in which the 'staged' deployable was built, but will not exist where it's ultimately run.
How can one configure the sbt-aspectj plugin to reference a packaged lib rather than one from the ivy cache?
Current configuration:
project/plugins.sbt:
addSbtPlugin("com.typesafe.sbt" % "sbt-aspectj" % "0.10.6")
addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.1.5")
build.sbt (selected parts):
import com.typesafe.sbt.SbtAspectj._
lazy val root = (project in file(".")).settings(
aspectjSettings,
javaOptions in Runtime ++= { AspectjKeys.weaverOptions in Aspectj }.value,
// see: https://github.com/sbt/sbt-native-packager/issues/598#issuecomment-111584866
javaOptions in Universal ++= { AspectjKeys.weaverOptions in Aspectj }.value
.map { "-J" + _ },
fork in run := true
)
Update
I've tried several approaches including pulling the relevant output for javaOptions from existing mappings, but the result is a cyclical dependency error thrown by sbt.
I have something that technically solves my problem but feels unsatisfactory. As of now, I'm including an aspectjweaver dependency directly and using the sbt-native-packager concept of bashScriptExtraDefines to append an appropriate javaagent:
updated build.sbt:
import com.typesafe.sbt.SbtAspectj._
lazy val root = (project in file(".")).settings(
aspectjSettings,
bashScriptExtraDefines += scriptClasspath.value
.filter(_.contains("aspectjweaver"))
.headOption
.map("addJava -javaagent:${lib_dir}/" + _)
.getOrElse(""),
fork in run := true
)
You can add the following settings in your sbt config:
.settings(
retrieveManaged := true,
libraryDependencies += "org.aspectj" % "aspectjweaver" % aspectJWeaverV)
AspectJ weaver JAR will be copied to ./lib_managed/jars/org.aspectj/aspectjweaver/aspectjweaver-[aspectJWeaverV].jar in your project root.
I actually solved this by using the sbt-javaagent plugin to adding agents to the runtime

How to configure Typesafe Activator not to use user home?

I'd like to configure Typesafe Activator and it's bundled tooling not to use my user home directory - I mean ~/.activator (configuration?), ~/.sbt (sbt configuration?) and especially ~/.ivy2, which I'd like to share between my two OSes.
Typesafe "documentation" is of little help.
Need help for both Windows and Linux, please.
From Command Line Options in the official documentation of sbt:
sbt.global.base - The directory containing global settings and plugins (default: ~/.sbt/0.13)
sbt.ivy.home - The directory containing the local Ivy repository and artifact cache (default: ~/.ivy2)
It appears that ~/.activator is set and used in the startup scripts and that's where I'd change the value.
It also appears (in sbt/sbt.boot.properties in activator-launch-1.2.1.jar) that the value of ivy-home is ${user.home}/.ivy2:
[ivy]
ivy-home: ${user.home}/.ivy2
checksums: ${sbt.checksums-sha1,md5}
override-build-repos: ${sbt.override.build.repos-false}
repository-config: ${sbt.repository.config-${sbt.global.base-${user.home}/.sbt}/repositories}
It means that without some development it's only possible to change sbt.global.base.
➜ minimal-scala activator -Dsbt.global.base=./sbt -Dsbt.ivy.home=./ivy2 about
[info] Loading project definition from /Users/jacek/sandbox/sbt-launcher/minimal-scala/project
[info] Set current project to minimal-scala (in build file:/Users/jacek/sandbox/sbt-launcher/minimal-scala/)
[info] This is sbt 0.13.5
[info] The current project is {file:/Users/jacek/sandbox/sbt-launcher/minimal-scala/}minimal-scala 1.0
[info] The current project is built against Scala 2.11.1
[info] Available Plugins: sbt.plugins.IvyPlugin, sbt.plugins.JvmPlugin, sbt.plugins.CorePlugin, sbt.plugins.JUnitXmlReportPlugin
[info] sbt, sbt plugins, and build definitions are using Scala 2.10.4
If you want to see under the hood, you could query for the current values of the home directories for sbt and Ivy with consoleProject command (it assumes you started activator with activator -Dsbt.global.base=./sbt -Dsbt.ivy.home=./ivy2):
> consoleProject
[info] Starting scala interpreter...
[info]
import sbt._
import Keys._
import _root_.sbt.plugins.IvyPlugin
import _root_.sbt.plugins.JvmPlugin
import _root_.sbt.plugins.CorePlugin
import _root_.sbt.plugins.JUnitXmlReportPlugin
import currentState._
import extracted._
import cpHelpers._
Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_60).
Type in expressions to have them evaluated.
Type :help for more information.
scala> appConfiguration.eval.provider.scalaProvider.launcher.bootDirectory
res0: java.io.File = /Users/jacek/sandbox/sbt-launcher/minimal-scala/sbt/boot
scala> appConfiguration.eval.provider.scalaProvider.launcher.ivyHome
res1: java.io.File = /Users/jacek/.ivy2
Iff you're really into convincing Activator to use sbt.ivy.home, you have to change sbt/sbt.boot.properties in activator-launch-1.2.2.jar. Just follow the steps:
Unpack sbt/sbt.boot.properties out of activator-launch-1.2.2.jar.
jar -xvf activator-launch-1.2.2.jar sbt/sbt.boot.properties
Edit sbt/sbt.boot.properties and replace ivy-home under [ivy].
ivy-home: ${sbt.ivy.home-${user.home}/.ivy2}
Add the changed sbt/sbt.boot.properties to activator-launch-1.2.2.jar.
jar -uvf activator-launch-1.2.2.jar sbt/sbt.boot.properties
With the change, -Dsbt.ivy.home=./ivy2 works fine.
scala> appConfiguration.eval.provider.scalaProvider.launcher.bootDirectory
res0: java.io.File = /Users/jacek/sandbox/sbt-launcher/minimal-scala/sbt/boot
scala> appConfiguration.eval.provider.scalaProvider.launcher.ivyHome
res1: java.io.File = /Users/jacek/sandbox/sbt-launcher/minimal-scala/ivy2
I was experimenting with this today. After a while, it seems to me like this could be the best thing to do:
Windows:
setx _JAVA_OPTIONS "-Duser.home=C:/my/preferred/home/"
Linux:
export _JAVA_OPTIONS='-Duser.home=/local/home/me'
Then you should be good to go for any Java Program that wants to store data in your home directory.
As an addition to Jacek's answer, another way that worked for me to set the .ivy2 directory was to use the sbt ivyConfiguration task. It returns configuration settings related to ivy, including the path to the ivy home (the one which defaults to ~/.ivy2).
Simply add these few lines to the build.sbt file in your project :
ivyConfiguration ~= { originalIvyConfiguration =>
val config = originalIvyConfiguration.asInstanceOf[InlineIvyConfiguration]
val ivyHome = file("./.ivy2")
val ivyPaths = new IvyPaths(config.paths.baseDirectory, Some(ivyHome))
new InlineIvyConfiguration(ivyPaths, config.resolvers, config.otherResolvers,
config.moduleConfigurations, config.localOnly, config.lock,
config.checksums, config.resolutionCacheDir, config.log)
}
It returns a new ivy configuration identical to the original one, but with the right path to the ivy home directory (here ./.ivy2, so it'll be located just next to the build.sbt file). This way, when sbt uses the ivyConfiguration task to get the ivy configuration, the path to the .ivy2 directory will be the one set above.
It worked for me using sbt 0.13.5 and 0.13.8.
Note: for sbt versions 0.13.6 and above, the construction of the InlineIvyConfiguration needs an additional parameter to avoid being flagged as deprecated, so you might want to change the last line into :
new InlineIvyConfiguration(ivyPaths, config.resolvers, config.otherResolvers,
config.moduleConfigurations, config.localOnly, config.lock,
config.checksums, config.resolutionCacheDir, config.updateOptions, config.log)
(note the additional config.updateOptions)
I had that same problem on Mac. I erased the directory in user home directory named .activator and all related folder. After that run activator run command on terminal. Activator download all of the folder which I deleted. Than problem solved.

SBT hangs resolving dependencies

We have an SBT 0.13.0 multi-project build with 17 projects: 1 leaf project, 15 modules that depend on the leaf (but not each other), and 1 aggregator that depends on the 15 modules.
Here's a very rough idea of what the Build.scala file looks like:
val deps: Seq[Setting[_]] = Seq(libraryDependencies ++= Seq(
"com.foo" % "foo" % "1.0.0",
"com.bar" % "bar" % "1.0.0"
))
val leaf = Project("leaf").settings(deps:_*)
val module1 = Project("module1").dependsOn(leaf).settings(deps:_*)
val module2 = Project("module2").dependsOn(leaf).settings(deps:_*)
...
val module15 = Project("module15").dependsOn(leaf).settings(deps:_*)
val aggregator = Project("aggregator)".dependsOn(
module1,
module2,
...
module15
).settings(deps:_*)
All of these projects list exactly the same set of external dependencies as libraryDependencies. For some reason, when we run the update command in the aggregator, it takes on the order of a minute per project (~15 minutes total!), even though there is no single new dependency to resolve or download.
Worse yet, we recently added one more dependency and now the update command causes SBT to swell up to ~5GB of memory and sometimes hang completely during resolution. How do we debug this?
We tried YourKit to profile it and, it may be a read herring, but so far, the only thing we see is some sbt.MultiLogger class spending a ton of time in a BufferedOutputStream.flush call.
If some of your external dependiencies are actually your own libraries pushed to a local repository and their version is set to "latest" then this hanging is expected. The reason is that ivy tries all repositories for all dependencies when "latest" version is required. Since your libraries are not pushed to public repositories then checking for them on public repositories ends in timeout (it is an ivy issue apparently).
I tried to replicate your setup and created an sbt project with leaf, 15 modules, aggregator and a few external dependencies. It all resolves quite quickly. You can check it out at
https://github.com/darkocerdic/sbt-multiproject-resolving

How do you do develop an SBT project, itself?

Background: I've got a Play 2.0 project, and I am trying to add something to do aspectj weaving using aspects in a jar on some of my classes (Java). (sbt-aspectj doesn't seem to do it, or I can't see how). So I need to add a custom task, and have it depend on compile. I've sort of figured out the dependency part. However, because I don't know exactly what I'm doing, yet, I want to develop this using the IDE (I'm using Scala-IDE). Since sbt projects (and therefore Play projects) are recursively defined, I assumed I could:
Add the eclipse plugin to the myplay/project/project/plugins.sbt
Add the sbt main jar (and aspectj jar) to myplay/project/project/build.sbt:
libraryDependencies ++= Seq(
"org.scala-sbt" % "main" % "0.12.2",
"aspectj" % "aspectj-tools" % "1.0.6"
)
Drop into the myplay/project
Run sbt, run the eclipse task, then import the project into eclipse as a separate project.
I can do this, though the build.scala (and other scala files) aren't initially considered source, and I have to fiddle with the build path a bit. However, even though I've got the sbt main defined for the project, both eclipse IDE and the compile task give errors:
> compile
[error] .../myplay/project/Build.scala:2: not found: object keys
[error] import keys.Keys._
[error] ^
[error] .../myplay/project/SbtAspectJ.scala:2: object Configurations is not a member of package sbt
[error] import sbt.Configurations.Compile
[error] ^
[error] .../myplay/project/SbtAspectJ.scala:3: object Keys is not a member of package sbt
[error] import sbt.Keys._
[error] ^
[error] three errors found
The eclipse project shows neither main nor aspectj-tools in its referenced-libraries. However, if I give it a bogus version (e.g. 0.12.4), reload fails, so it appears to be using
the dependency.
So,...
First: Is this the proper way to do this?
Second: If so, why aren't the libs getting added.
(Third: please don't let this be something dumb that I missed.)
If you are getting the object Keys is not a member of package sbt error, then you should check that you are running sbt from the base directory, and not the /project directory.
sbt-aspectj
sbt-aspectj doesn't seem to do it, or I can't see how.
I think this is the real issue. There's a plugin already that does the work, so try making it work instead of fiddling with the build. Using plugins from build.scala is a bit tricky.
Luckily there are sample projects on github:
import sbt._
import sbt.Keys._
import com.typesafe.sbt.SbtAspectj.{ Aspectj, aspectjSettings, compiledClasses }
import com.typesafe.sbt.SbtAspectj.AspectjKeys.{ binaries, compileOnly, inputs, lintProperties }
object SampleBuild extends Build {
....
// precompiled aspects
lazy val tracer = Project(
"tracer",
file("tracer"),
settings = buildSettings ++ aspectjSettings ++ Seq(
// stop after compiling the aspects (no weaving)
compileOnly in Aspectj := true,
// ignore warnings (we don't have the sample classes)
lintProperties in Aspectj += "invalidAbsoluteTypeName = ignore",
// replace regular products with compiled aspects
products in Compile <<= products in Aspectj
)
)
}
How do you do develop an SBT project, itself?
If you're interested in hacking on the build still the first place to go is the Getting Started guide. Specifically, your question should be answered in .scala Build Definition page.
I think you want your build to utilize "aspectj" % "aspectj-tools" % "1.0.6". If so it should be included in myplay/project/plugins.sbt, and your code should go into myplay/project/common.scala or something. If you want to use IDE, you have have better luck with building it as a sbt plugin. That way your code would go into src/main/scala. Check out sbt/sbt-aspectj or sbt/sbt-assembly on example of sbt plugin structure.

Resources