code working fine on zepplin but not working with spark-submit after compiling it with sbt - sbt

I am working with time series for stock market forcast . My spark scala script work fine on zeppelin but after compiling my script with sbt am not getting the results desired but jus null values. I have also an isssue concerning an unresolved dependency for com.cloudera.sparkts .
code line causing pb : result expected
val df = spark.createDataFrame(tsRdd.mapSeries { vector => {
val newVec = new org.apache.spark.mllib.linalg.DenseVector(vector.toArray.map(x => if (x.equals(Double.NaN)) 0 else x))
val arimaModel = ARIMA.fitModel(1, 0, 0, newVec)
val forecasted = arimaModel.forecast(newVec, DAYS)
new org.apache.spark.mllib.linalg.DenseVector(forecasted.toArray.slice(forecasted.size - (DAYS + 1), forecasted.size - 1))
}}.toJavaRDD).toDF("lab", "features").withColumn("featuresArr", vecToArray($"features"))
df.select((col("lab") +: Array("f1", "f2", "f3", "f4", "f5").zipWithIndex.map { case (alias, idx) => col("featuresArr").getItem(idx).as(alias) }): _*).show
The output :
I got only null values

Some of the dependencies issues are solved by reloading plugins and updating dependencies. Especially when the dependency issue is due to Ivy cache.
Do the following:
sbt reload plugins
sbt update
sbt reload
If you have issue check that the spark/scala version you define on your build is correct.
For the null values, I would start looking into the intermediaries step of the data pipeline and try to understand what is maybe wrong.

Related

How to use rules_webtesting?

I want to use https://github.com/bazelbuild/rules_webtesting. I am using Bazel 5.2.0.
The whole project can be found here.
My WORKSPACE.bazel file looks like this:
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "io_bazel_rules_webtesting",
sha256 = "3ef3bb22852546693c94e9b0b02c2570e74abab6f800fd58e0cbe79492e49c1b",
urls = [
"https://github.com/bazelbuild/rules_webtesting/archive/581b1557e382f93419da6a03b91a45c2ac9a9ec8/rules_webtesting.tar.gz",
],
)
load("#io_bazel_rules_webtesting//web:repositories.bzl", "web_test_repositories")
web_test_repositories()
My BUILD.bazel file looks like this:
load("#io_bazel_rules_webtesting//web:py.bzl", "py_web_test_suite")
py_web_test_suite(
name = "browser_test",
srcs = ["browser_test.py"],
browsers = [
"#io_bazel_rules_webtesting//browsers:chromium-local",
],
local = True,
deps = ["#io_bazel_rules_webtesting//testing/web"],
)
browser_test.py looks like this:
import unittest
from testing.web import webtest
class BrowserTest(unittest.TestCase):
def setUp(self):
self.driver = webtest.new_webdriver_session()
def tearDown(self):
try:
self.driver.quit()
finally:
self.driver = None
# Your tests here
if __name__ == "__main__":
unittest.main()
When I try to do a bazel build //... I get (under Ubuntu 20.04 and macOS):
INFO: Invocation ID: 74c03efd-9caa-4174-9fda-42f7ff37e38b
ERROR: error loading package '': Every .bzl file must have a corresponding package, but '#io_bazel_rules_webtesting//web:repositories.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist.
INFO: Elapsed time: 0.038s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
The error message does not make sense to me, since there is a BUILD file in
https://github.com/bazelbuild/rules_webtesting/blob/581b1557e382f93419da6a03b91a45c2ac9a9ec8/BUILD.bazel
and https://github.com/bazelbuild/rules_webtesting/blob/581b1557e382f93419da6a03b91a45c2ac9a9ec8/web/BUILD.bazel.
I also tried a different version of Bazel - but with the same result.
Any ideas on how to get this working?
You need to add a strip_prefix = "rules_webtesting-581b1557e382f93419da6a03b91a45c2ac9a9ec8" in your http_archive call.
For debugging, you can look in the folder where Bazel extracts it: bazel-out/../../../external/io_bazel_rules_webtesting. #io_bazel_rules_webtesting//web translates to bazel-out/../../../external/io_bazel_rules_webtesting/web, so if that folder doesn't exist things won't work.

In a sbt project, how to get full list of dependencies with scope?

In a sbt project, how to get full list of dependencies(including transitive dependencies) with scope?
For Example:
xxx.jar(Compile)
yyy.jar(Provided)
zzz.jar(Test)
...
From the sbt shell you can execute one or all of the following commands:
Test / fullClasspath
Runtime / fullClasspath
Compile / fullClasspath
Which will output the jars associated with the scope (Test/Runtime/Compile).
If you want to get a bit more fancy, sbt provides a number of ways of interacting with the outputs generated through the dependency management system. The documentation is here.
For example, you could add this to your build.sbt file:
lazy val printReport = taskKey[Unit]("Report which jars are in each scope.")
printReport := {
val updateReport: UpdateReport = update.value
val jarFilter: ArtifactFilter = artifactFilter(`type` = "jar")
val testFilter = configurationFilter(name = "test")
val compileFilter = configurationFilter(name = "compile")
val testJars = updateReport.matching(jarFilter && testFilter)
val compileJars = updateReport.matching(jarFilter && compileFilter)
println("Test jars:\n===")
for (jar <- testJars.sorted) yield {println(jar.getName)}
println("\n\n******\n\n")
println("compile jars:\n===")
for (jar <- compileJars.sorted) yield {println(jar.getName)}
}
It creates a new task printReport which can be executed like a normal sbt command with sbt printReport. It takes the value of the UpdateReport which is generated by the update task, and then filters for jar files in the respective test/compile scopes before printing the results.

NoSuchMethodException when calling sendKeys on object of class org.openqa.selenium.remote.RemoteWebElement via R package rJava

I am trying to use the selenium webdriver API directly from R using rJava. I am subject to a fairly restrictive IT environment, so I can't access a remote driver currently (hence why I'm not currently using the Rselenium package), and I don't have either Chrome or Firefox availaible--just phantomjs. I am able to get this working okay from the Scala REPL. I used sbt to get all the dependenices--build.sbt contains, for example:
retrieveManaged := true
libraryDependencies ++= Seq (
"org.seleniumhq.selenium" % "selenium-java" % "3.9.1",
"com.codeborne" % "phantomjsdriver" % "1.4.4"
)
(Note that I have phantomjs installed as /usr/local/bin/phantomjs, and it is
version 2.1.1).
I then copied all the jar files to a single-level folder via cp jars/*/*/*.jar alljars/ containing the following:
animal-sniffer-annotations-1.14.jar httpcore-4.4.6.jar selenium-api-3.9.1.jar
byte-buddy-1.7.9.jar j2objc-annotations-1.1.jar selenium-chrome-driver-3.9.1.jar
checker-compat-qual-2.0.0.jar jline-2.14.5.jar selenium-edge-driver-3.9.1.jar
commons-codec-1.10.jar jsr305-1.3.9.jar selenium-firefox-driver-3.9.1.jar
commons-exec-1.3.jar okhttp-3.9.1.jar selenium-ie-driver-3.9.1.jar
commons-logging-1.2.jar okio-1.13.0.jar selenium-java-3.9.1.jar
error_prone_annotations-2.1.3.jar phantomjsdriver-1.4.4.jar selenium-opera-driver-3.9.1.jar
gson-2.8.2.jar scala-compiler-2.12.4.jar selenium-remote-driver-3.9.1.jar
guava-23.6-jre.jar scala-library-2.12.4.jar selenium-safari-driver-3.9.1.jar
httpclient-4.5.3.jar scala-reflect-2.12.4.jar selenium-support-3.9.1.jar
I start Scala via scala -cp "alljars/*" and can the do following:
val drv = new org.openqa.selenium.phantomjs.PhantomJSDriver
drv.get("https://www.google.com")
val q = drv.findElementByName("q")
q.sendKeys("rJava selenium")
q.submit
drv.getTitle
I think the following is roughly the same thing in R using rJava:
library(rJava)
.jinit()
jars <- dir("alljars", pattern = "*.jar", full.names = TRUE)
.jaddClassPath(jars)
drv <- .jnew('org/openqa/selenium/phantomjs/PhantomJSDriver')
drv$get("https://www.google.com")
q <- drv$findElementByName("q")
q$sendKeys("rJava selenium")
q$submit()
drv$getTitle()
This fails at the point q$sendKeys("rJava selenium") with the following error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.lang.NoSuchMethodException: No suitable method for the given parameters
In RStudio, if I type q$ and press TAB, sendKeys is definitely in the list of available methods. I tried to be explicit about this, and tried:
keys <- .jnew("java/lang/String", "rJava selenium")
keys <- .jcast(keys, "java/lang/CharSequence", check = TRUE)
q <- .jcast(q, "org/openqa/selenium/WebElement", check = TRUE)
.jcall(q, "V", "sendKeys", keys)
which resulted in the following error:
Error in .jcall(q, "V", "sendKeys", keys) :
method sendKeys with signature (Ljava/lang/CharSequence;)V not found
q has class org/openqa/selenium/remote/RemoteWebElement in R, and org/openqa/selenium/WebElement in Scala; but in both cases the return is void and the required argument is CharSequence according to the javadocs. I tried a few variations of this--java.lang.String instead of CharSequence, RemoteWebElement instead of WebElement, etc., but no joy.
I doubt this is a problem with rJava, but I'm stumped nonetheless and need help!
Oh good grief. I didn't know about .jmethods. Running this:
> .jmethods(q, "sendKeys")
[1] "public void org.openqa.selenium.remote.RemoteWebElement.sendKeys(java.lang.CharSequence[])"
So, basically, my problem was that I was passing String instead of String[]. That is, instead of:
q$sendKeys("rJava selenium")
I can use:
q$sendKeys(.jarray("rJava selenium"))
The more you know...

Including project in build depending on setting's value, e.g. scalaVersion?

I have a Scala project that is divided into several subprojects:
lazy val core: Seq[ProjectReference] = Seq(common, json_scalaz7, json_scalaz)
I'd like to make the core lazy val conditional on the Scala version I'm currently using, so I tried this:
lazy val core2: Seq[ProjectReference] = scalaVersion {
case "2.11.0" => Seq(common, json_scalaz7)
case _ => Seq(common, json_scalaz7, json_scalaz)
}
Simply speaking, I'd like to exclude json_scalaz for Scala 2.11.0 (when the value of the scalaVersion setting is "2.11.0").
This however gives me the following compilation error:
[error] /home/diego/work/lift/framework/project/Build.scala:39: type mismatch;
[error] found : sbt.Project.Initialize[Seq[sbt.Project]]
[error] required: Seq[sbt.ProjectReference]
[error] lazy val core2: Seq[ProjectReference] = scalaVersion {
[error] ^
[error] one error found
Any idea how to solve this?
Update
I'm using sbt version 0.12.4
This project is the Lift project, which compiles against "2.10.0", "2.9.2", "2.9.1-1", "2.9.1" and now we are working on getting it to compile with 2.11.0. So creating a compile all task would not be practical, as it would take a really long time.
Update 2
I'm hoping there is something like this:
lazy val scala_xml = "org.scala-lang.modules" %% "scala-xml" % "1.0.1"
lazy val scala_parser = "org.scala-lang.modules" %% "scala-parser-combinators" % "1.0.1"
...
lazy val common =
coreProject("common")
.settings(description := "Common Libraties and Utilities",
libraryDependencies ++= Seq(slf4j_api, logback, slf4j_log4j12),
libraryDependencies <++= scalaVersion {
case "2.11.0" => Seq(scala_xml, scala_parser)
case _ => Seq()
}
)
but for the projects list
Note how depending on the scala version, I add the scala_xml and scala_parser_combinator libraries
You can see the complete build file here
Cross building a project
Simply speaking, I'd like to exclude json_scalaz for Scala 2.11.0
The built-in support in sbt for this is called cross building, which is described in Cross-Building a Project. Here's from the section with a bit of correction:
Define the versions of Scala to build against in the crossScalaVersions setting. For example, in a .sbt build definition:
crossScalaVersions := Seq("2.10.4", "2.11.0")
To build against all versions listed crossScalaVersions, prefix the action to run with +. For example:
> +compile
Multiple-project builds
sbt also has built-in support to aggregate tasks across multiple projects, which is described Aggregation. If what you need eventually is normal built-in tasks like compile and test, you could set up a dummy aggregate without json_scalaz.
lazy val withoutJsonScalaz = (project in file("without-json-scalaz")).
.aggregate(liftProjects filterNot {_ == json_scalaz}: _*)
From the shell, you should be able to use this as:
> ++2.11.0
> project withoutJsonScalaz
> test
Getting values from multiple scopes
Another feature you might be interested in is ScopeFilter. This has the ability to traverse multiple projects beyond usual aggregation and cross building. You would need to create a setting whose type is ScopeFilter and set it based on scalaBinaryVersion.value. With scope filters, you can do:
val coreProjects = settingKey[ScopeFilter]("my core projects")
val compileAll = taskKey[Seq[sbt.inc.Analysis]]("compile all")
coreProjects := {
(scalaBinaryVersion.value) match {
case "2.10" => ScopeFilter(inProjects(common, json_scalaz7, json_scalaz))
}
}
compileAll := compileAllTask.value
lazy val compileAllTask = Def.taskDyn {
val f = coreProjects.value
(compile in Compile) all f
}
In this case compileAll would have the same effect as +compile, but you could aggregate the result and do something interesting like sbt-unidoc.

How do I evaluate an sbt SettingsKey

I want to combine the sbt-release plugin with the Play framework.
The plugins reads the current version number from a file version.sbt. Its content is
version in ThisBuild := "0.41.0-SNAPSHOT"
I would like to use this setting in my main build file but the variable version is of type sbt.SettingKey.
There is an evaluate method but for the life of me I can't figure out what to pass in to get the String I defined in version.sbt.
I tried the accepted answer's solution but it didn't compile. (Play 2.1.5)
[error] (ss: sbt.Project.Setting[_]*)sbt.Project <and>
[error] => Seq[sbt.Project.Setting[_]]
[error] cannot be applied to (Seq[sbt.ModuleID])
[error] val main = play.Project(appName).settings(appDependencies).settings(releaseSettings).settings(
[error] ^
[error] one error found
Instead I came up with this solution:
...
lazy val appSettings = Defaults.defaultSettings ++ ... ++ releaseSettings
val main = play.Project(appName, dependencies = appDependencies, settings = appSettings).settings(
version <<= version in ThisBuild,
...
)
This is a little shortcoming with the play.Project constructor, it excepts a static version number, not one from a setting key.
However, the only required parameter is the application name, so you can switch from something like:
val main = play.Project(appName, appVersion, appDependencies, settings =
Defaults.defaultSettings ++ releaseSettings ).settings(...)
to
val main = play.Project(appName).settings(appDependencies).
settings(releaseSettings).settings(...)
Normally, the version defined in version.sbt should be picked up here automagically. If it isn't, you can always add to the above:
.settings(applicationVersion <<= version in ThisBuild)

Resources