Installing local .whl files on Databricks cluster - airflow

I am trying to connect to a databricks cluster and install a local python whl using DatabricksSubmitRunOperator on Airflow (v2.3.2) with following configuration. However, it doesn't work and throws a fileNotFound exception (I checked file path multiple times, file exists).
task1 = DatabricksSubmitRunOperator(
task_id = <task_id>,
job_name = <job_name>,
existing_cluster_id = <cluster_id>,
libraries=[
{"whl": "file:/<local_absolute_path>"}
]
)
While the official documentation states that, for .whl files, only DBFS and S3 storage is supported, in Airflow, I see the following error message when prefix file:/ is not attached:
Library installation failed for library due to user error.
Error messages: Python wheels must be stored in dbfs, s3, adls, gs or as a local file. Make sure the URI begins with 'dbfs:', 'file:', 's3:', 'abfss:', 'gs:'
Is it possible install local .whl files on a databricks cluster?
Alternative approach I tried is to copy .whl to dbfs storage and install it from there. The problem with that is that installation status is stuck at "pending".
Any help is appreciated.

You can directly install or upload .whl file as shown in the below image.
or
Follow this official document installing .whl packages.

Related

Installing a package from private GitLab server on Windows

I am struggling with installing a package from a GitLab repository on a Windows computer.
I found different hints but still have problems to install my package from GitLab. First of all, I generated a public and private key with puttygen.exe. The files need to be changed afterwards, I had to remove comments and stuff so they look like my the file on my Unix system. So now, both public and private key files have just a single line.
I tried to install my package via devtools::install_git which takes very long and I get the error message
Error: Failed to install 'unknown package' from Git:
Error in 'git2r_remote_ls': Failed to authenticate SSH session: Unable to send userauth-publickey request
And with devtools::install_gitlab I get a different error message and I somehow have the feeling, the link which gets generated doesn't fit to my GitLab server.
Error: Failed to install 'unknown package' from GitLab:
cannot open URL 'https://gitlab.rlp.net/api/v4/projects/madejung%2FMQqueue.git/repository/files/DESCRIPTION/raw?ref=master'
My complete code to test at the moment is
creds <- git2r::cred_ssh_key(publickey="~/.ssh/id_rsa_gitlab.pub",
privatekey="~/.ssh/id_rsa_gitlab")
devtools::install_git(
url='git#gitlab.rlp.net:madejung/MQqueue.git',
quiet=FALSE,
credentials=creds)
devtools::install_gitlab(
repo='madejung/MQqueue.git',
host='gitlab.rlp.net',
quiet=FALSE,
credentials=creds
)
My id_rsa_gitlab.pub file looks like this and is just a single line:
ssh-rsa AAAA....fiwbw== rsa-key-20200121
The id_rsa_gitlab file has just the code:
AAABA.....3WNSIAGE=
Update
On my Mac system it works as expected after installing the libssh2 library via homebrew and and recompiling git2r with install.packages("git2r", type = "source").
So the working code on my machine is:
creds <- git2r::cred_ssh_key(publickey="~/.ssh/id_rsa_gitlab.rlp.net.pub",
privatekey="~/.ssh/id_rsa_gitlab.rlp.net")
devtools::install_git(
url='git#gitlab.rlp.net:madejung/MQqueue.git',
quiet=FALSE,
credentials=creds
)
For some strange reason, the devtools::install_git call needs about a minute to fail in the end. I have no idea where the problem here is.
After struggling for almost a day, I found a solution I can live with...
I first created a PAT (Personal Access Token) in my gitlab account and granted full API access. For some reason the read_only access didn't worked and I am now tired to figure out what the problem is.
After this I had still problems to install my package and for some reason, the wininet setting for downloading doesn't work.
I used the command capabilities("libcurl") to check if libcurl is available on my windows, which was and tried to overwrite wininet to libcurl by using method='libcurl' in the install function. Somehow, this was not enough so I overwrote the options variable download.file.method directly.
options("download.file.method"='libcurl')
devtools::install_gitlab(
repo='madejung/MQqueue',
auth_token='Ho...SOMETHING...xugzb',
host='gitlab.rlp.net',
quiet=FALSE, force=TRUE
)

virsh restore with modified xml "Error: xml modification unsupported"

I'm trying to restore an a Xen VM (domain) from state file which I create previously. At the restore I need to modify the XML of this VM with the following command:
virsh restore domU.state --xml newconfig.xml
This command triggers an error with the following text:
error: Failed to restore domain from domU.state
error: argument unsupported: xml modification unsupported
What I already try:
restore without XML, which works perfectly.
run the command with the original xml the domain was created from
run the command with a totally different file which is not even an XML
At step 2. & 3. the error output was always the same.
Used versions:
xen 4.11.1
libvirt 5.1.0
os fedora 30
As the error message suggests, the ability to pass in custom XML when restoring a guest from a snapshot, is unfortunately not supported by the libvirt Xen (libxl) driver. This feature only works with QEMU/KVM at this time.

Load h5 keras model file in R

I'm building a R package for binary classification and I'm using opencpu to host it. Currently I've saved the h5 file as .RData file(serialized), which is then loaded in the environment using the .onLoad() function in R. This enables the R script to use the environment variable to load keras model using keras::unserialized_model().
I've tried directly using keras::load_model_hdf5() in the code, but after building and deploying on opencpu, when I try to hit the prediction API, I get error
ioerror: unable to open file (unable to open file: name = '/home/modelfile_26feb.h5', errno = 13, error message = 'permission denied', flags = 0, o_flags = 0)
I have changed permission for the file(777) and even the groups but still getting the error.
I even tried putting the file in inst/extdata folder so that it gets in the package but still same error.
Can anyone help on this, or suggest some alternative to load the h5 model directly?
Which OS does OpenCPU run on? Why does it try to write in /home/, this is very unusual? The best solution is to adapt your code to write in getwd() or tempdir(). Even better is to store data in a local database or redis server and let R read it from there, so you don't need disk access at all.
If you run on Ubuntu Server, reading from /home/ is not permitted by default. If you want to allow this, you need to add apparmor rules, see section 3.5 of the server manual.
Some relevant topics from the opencpu mailing list:
write in home dir: https://groups.google.com/d/msg/opencpu/5vRvgSKY-qE/4xMzZCGJBAAJ
keras in opencpu: https://groups.google.com/d/msg/opencpu/HhRzFVVFdaA/n5Nu1sxyFgAJ
write tmp folder: https://groups.google.com/d/msg/opencpu/Y1tYhaQUzwU/ubSEd_CDCgAJ

Apache spark-shell error import jars

I have a local spark 1.5.2 (hadoop 2.4) installation on Windows as explained here.
I'm trying to import a jar file that I created in Java using maven (the jar is jmatrw that I uploaded on here on github). Note the jar does not include a spark program and it has no dependencies to spark. I tried the following steps, but no one seems to work in my installation:
I copied the library in "E:/installprogram/spark-1.5.2-bin-hadoop2.4/lib/jmatrw-v0.1-beta.jar"
Edit spark-env.sh and add SPARK_CLASSPATH="E:/installprogram/spark-1.5.2-bin-hadoop2.4/lib/jmatrw-v0.1-beta.jar"
In a command window I run > spark-shell --jars "E:/installprogram/spark-1.5.2-bin-hadoop2.4/lib/jmatrw-v0.1-beta.jar", but it says "Warning: skip remote jar"
In the spark shell I tried to do scala> sc.addJar("E:/installprogram/spark-1.5.2-bin-hadoop2.4/lib/jmatrw-v0.1-beta.jar"), it says "INFO: added jar ... with timestamp"
When I type scala> import it.prz.jmatrw.JMATData, spark-shell replies with error: not found: value it.
I spent lot of time searching on Stackoverflow and on Google, indeed a similar Stakoverflow question is here, but I'm still not able to import my custom jar.
Thanks
There are two settings in 1.5.2 to reference an external jar. You can add it for the driver or for the executor(s).
I'm doing this by adding settings to the spark-defaults.conf, but you can set these at spark-shell or in SparkConf.
spark.driver.extraClassPath /path/to/jar/*
spark.executor.extraClassPath /path/to/jar/*
I don't see anything really wrong with the way you are doing it, but you could try the conf approach above, or setting these using SparkConf
val conf = new SparkConf()
conf.set("spark.driver.extraClassPath", "/path/to/jar/*")
val sc = new SparkContext(conf)
In general, I haven't enjoyed working with Spark on Windows. Try to get onto Unix/Linux.

Shiny Server cannot use RODBC to connect to DB2 but RStudio can in a Docker Container

I am working on deploying a shiny application in a Docker container onto Bluemix. I am using the rocker/shiny Docker image (https://hub.docker.com/r/rocker/shiny/) as my initial starting point. I have installed unixODBC-dev, RODBC, ibm data server driver package, the ibmdbR library for R, and all needed dependencies. My only problem is that when I try to access the shiny app from a web browser it fails to execute, the error is:
Warning in odbcDriverConnect("DSN=BLUDB", :
[RODBC] ERROR: state 01000, code 0, message [unixODBC][Driver Manager]Can't open lib '/root/db2_cli_odbc_driver/dsdriver/odbc_cli_driver/linuxamd64/clidriver/lib/libdb2o.so' : file not found
Warning in odbcDriverConnect("DSN=BLUDB; :
ODBC connection failed
Error in idaInit(con) : con is not an open connection, please use idaConnect() to create an open connection to the data base.
Initially I had this same problem whenever I would try to use isql to connect to the database or try to connect from RStudio, I used ldd on that library file and found what was missing and that fixed making connections from the command line and RStudio, however my Shiny-Server still gives me the same error, is there anything I am missing?
I ended up solving the problem myself, turns out the libraries were not accessible by the shiny-server which was running as a service. I moved the db2 odbc drivers over to /usr/local/lib to make it accessible, I also ran the "ldd" command on the library mentioned in the error message and found that I had to install libxml2 as well. After doing that I simply changed my odbcinst.ini file at /etc to reference the new location of the db2 library and now it all works! Hopefully anyone else trying to deploy Shiny Apps that rely on connecting to a DB2 database will find this useful.

Resources