using pyodbc in azure databrick for connecting with SQL server - pyodbc

import pyodbc
pyodbc.connect('Driver={SQL SERVER};'
'Server=server name;'
'Database = database name;'
'UID='my uid;'
'PWD= 'my password;'
'Authentication = ActiveDirectoryPassword')
running above code in databrick notebook i am getting following error
Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'SQL SERVER' : file not found (0) (SQLDriverConnect)")

By default, Azure Databricks does not have ODBC Driver installed.
For SQL Server: You can resolve the issue by using the following script
sudo apt-get -q -y install unixodbc unixodbc-dev
sudo apt-get -q -y install python3-dev
sudo pip install --upgrade pip
pip install pyodbc
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql
For Azure SQL Database: Run the following commands in a single cell to install MY SQL ODBC Driver on Azure Databricks cluster.
%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17

What you posted looks like straight Python code. In the Databricks environment, things are a little different than they are on your local machine.
Try it like this.
import pyodbc
server = '<server>.database.windows.net'
database = '<database>'
username = '<username>'
password = '<password>'
driver= '{ODBC Driver 17 for SQL Server}'
cnxn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
cursor.execute("SELECT TOP 20 pc.Name as CategoryName, p.name as ProductName FROM [SalesLT].[ProductCategory] pc JOIN [SalesLT].[Product] p ON pc.productcategoryid = p.productcategoryid")
row = cursor.fetchone()
while row:
print (str(row[0]) + " " + str(row[1]))
row = cursor.fetchone()

Related

I can not connect ODBC in google colab

I am trying connect ODBC with Colab, but it is give me a error. The error is OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')
Could someone help me?
Let you my code
%%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17
!pip install pyodbc
import pyodbc
Parámetros de conexiòn a SQL
server = 'server-training.database.windows.net'
database = 'db_axm'
username = 'axm_reader'
password = 'careerp#th22*'
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
'SERVER='+ server +';'
'DATABASE='+ database +';'
'UID='+ username +';'
'PWD='+ password)`
I have benn triying change the code with some alternatives but I have been unsuccessful

Get Access to Airflow Hooks within a jupyter notebook

I have Airflow running using the postgres backend - all fine. Additionally I have running a Jupyter server on the same host where Airflow runs. Now I thought I can just access the airflow hooks from a notebook.
import pandas as pd
import numpy as np
import matplotlib as plt
from airflow.hooks.mysql_hook import MySqlHook
mysql = MySqlHook(mysql_conn_id = 'mysql-x')
sql = "select 1+1"
mysql.get_pandas_df(sql)
But I get this exception message:
OperationalError: (sqlite3.OperationalError) no such table: connection
[SQL: SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted
FROM connection
WHERE connection.conn_id = ?]
[parameters: ('mysql-x',)]
(Background on this error at: http://sqlalche.me/e/e3q8)
But what makes me get suspicious is that it not only does not find the connection_id (which I clearly can see in the Airflow UI). It also says: sqlite3.OperationalError - it very much looks like it is not even connected to the same postgres database. I have checked os.environ["AIRFLOW_HOME"] which seems to be correct.
EDIT 1:
After I start the jupyter notebook server after airflow, such that all environment variables are set, I get a different error:
/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/attributes.py in __get__(self, instance, owner)
351
352 def __get__(self, instance, owner):
--> 353 retval = self.descriptor.__get__(instance, owner)
354 # detect if this is a plain Python #property, which just returns
355 # itself for class level access. If so, then return us.
/usr/local/lib/python3.7/site-packages/airflow/models/connection.py in get_password(self)
188 "Can't decrypt encrypted password for login={}, \
189 FERNET_KEY configuration is missing".format(self.login))
--> 190 return fernet.decrypt(bytes(self._password, 'utf-8')).decode()
191 else:
192 return self._password
/usr/local/lib/python3.7/site-packages/cryptography/fernet.py in decrypt(self, msg, ttl)
169 except InvalidToken:
170 pass
--> 171 raise InvalidToken
InvalidToken:
You can use this dockerfile to reproduce it:
FROM apache/airflow
USER root
# install mysql client
RUN apt-get update && apt-get install -y mariadb-client-10.3 unzip
# install mssql client and tools
RUN apt-get install -y curl gnupg libicu-dev libicu63
RUN curl https://packages.microsoft.com/keys/microsoft.asc -o key
RUN apt-key add < key
RUN curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/msprod.list
ENV ACCEPT_EULA=Y
RUN apt-get update && apt-get install -y mssql-tools msodbcsql17 unixodbc unixodbc-dev unzip libunwind8
RUN curl -Lq 'https://go.microsoft.com/fwlink/?linkid=2108814' -o sqlpackage-linux-x64-latest.zip
RUN mkdir /opt/sqlpackage/ && unzip sqlpackage-linux-x64-latest.zip -d /opt/sqlpackage/
RUN chmod a+x /opt/sqlpackage/sqlpackage && ln -sfn /opt/sqlpackage/sqlpackage /usr/bin/sqlpackage
RUN ln -sfn /opt/mssql-tools/bin/sqlcmd /usr/bin/sqlcmd
# install notebooks
RUN pip install jupyterlab pandas numpy scikit-learn matplotlib pymssql
#RUN cat /entrypoint.sh
# start additional notebok server
# this is a dirty hack but for the sake of this prototype good enough
RUN sed -i -e's/\# Run the command/airflow scheduler \& \njupyter notebook --ip=0.0.0.0 --port=9000 --NotebookApp.token="" --NotebookApp.password="" \& \n/' /entrypoint
EXPOSE 9000
# switch back to airflow user
USER airflow
RUN airflow initdb
RUN alias ll='ls -al'

How to mount Cloud Filestore in GCP AI platform Jupyter notebook?

I want to mount a Cloud Filestore instance in a GCP AI Platform Jupyter notebook instance so that I don't have to upload all of my data into the notebook.
I followed the instructions at https://cloud.google.com/filestore/docs/mounting-fileshares, but get these error messages:
root#0084329abd1b:/home# mount <IP_ADDRESS>:/streams cfs
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
root#0084329abd1b:/home# mount -o nolock <IP_ADDRESS>:/streams cfs
mount.nfs: Operation not permitted
From your terminal, you can do something like this.
mkdir des_bucket
gcsfuse --debug_gcs --implicit-dirs src_bucket des_bucket
Create a Filestore instance link
Crerate a Google VM instance link
Create a Notebook AI instance link
On the VM instance run the commands:
sudo apt-get -y update
sudo apt-get -y install nfs-common
sudo mkdir test
# fileshare remote target
sudo mount 111.11.111.11:/fileshare test
sudo chmod go+rw test
echo 'This is a test' > test/testfile
ls test
#testfile
On the Notebook AI instance run the commands link:
sudo apt-get -y update
sudo apt-get -y install nfs-common
sudo mkdir test
# fileshare remote target
sudo mount 111.11.111.11:/fileshare /test
ls test
#testfile
You can also check link

How to persist/keep sqlite database in docker container application? [duplicate]

I'm new to Docker. Is it possible to embed a sqlite database in a docker container and have it updated every time my script in that container runs?
Dockerfile example to install sqlite3
FROM ubuntu:trusty
RUN sudo apt-get -y update
RUN sudo apt-get -y upgrade
RUN sudo apt-get install -y sqlite3 libsqlite3-dev
RUN mkdir /db
RUN /usr/bin/sqlite3 /db/test.db
CMD /bin/bash
persist the db file inside host OS folder /home/dbfolder
docker run -it -v /home/dbfolder/:/db imagename
If you want to persist the data in sqlite, use host directory/file as a data volume
Refer "Mount a host directory as a data volume" section in
https://docs.docker.com/storage/volumes/

How to Choose R Server's R as Default in Operationalization, Remote R Workspace and RStudio Server?

So I've set up an Azure Data Science Virtual Machine on Linux (Ubuntu) and I've executed the following on the terminal to enable Remote R workspace, RStudio Server, R Server Operationalization and hadoop:
sudo apt update
sudo apt -y upgrade
# Hadoop is installed but doesn't seem to appear on the PATH or have its environment variable set by default
sudo echo "" >> ~/.bashrc
sudo echo "export PATH="'$'"PATH:/opt/hadoop/hadoop-2.7.4/bin" >> ~/.bashrc
sudo echo "export HADOOP_HOME=/opt/hadoop/hadoop-2.7.4" >> ~/.bashrc
#
source ~/.bashrc
#Setting up a password as none exists to begin with because of private key selection in the installation
#RStudio Server requires a password though
"MyPassword\nMyPassword\n" | sudo passwd sshuser
#Unfortunately hadoop fails on Data Science Virtual Machine
#error: mkdir: Call From IM-DSonUbuntu/192.168.5.4 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
# hadoop fs -mkdir /user/RevoShare/rserve2
# hadoop fs -chmod uog+rwx /user/RevoShare/rserve2
sudo mkdir -p /var/RevoShare/rserve2
sudo chmod uog+rwx /var/RevoShare/rserve2
# hadoop fs -mkdir /user/RevoShare/sshuser
# hadoop fs -chmod uog+rwx /user/RevoShare/sshuser
sudo mkdir -p /var/RevoShare/sshuser
sudo chmod uog+rwx /var/RevoShare/sshuser
#Setting up R Server Operationalisation
cd /opt/microsoft/mlserver/9.2.1/o16n
sudo dotnet Microsoft.MLServer.Utils.AdminUtil/Microsoft.MLServer.Utils.AdminUtil.dll -silentoneboxinstall MyPassword
#They say this Data Science Virtual Machine already has RStudio Server, but even though the port 8787 is open, it's nowhere to be found! So installing it now, and after the installation it's accessible by refreshing the page that failed before.
#Perhaps it's not installed then? Or a service is not running like it shoudl?
#https://www.rstudio.com/products/rstudio/download-server/
wget https://download2.rstudio.org/rstudio-server-1.1.414-amd64.deb
yes | sudo gdebi rstudio-server-1.1.414-amd64.deb
#They are small, leave them for debug reasons - lets have evidence the script run thus far.
#sudo rm rstudio-server-1.1.414-amd64.deb
# Remote R workspace Service needs dotnet sdk
curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg
sudo mv microsoft.gpg /etc/apt/trusted.gpg.d/microsoft.gpg
sudo sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-xenial-prod xenial main" > /etc/apt/sources.list.d/dotnetdev.list'
sudo apt update
sudo apt -y install dotnet-sdk-2.0.0
sudo apt install libxml2-dev
#Downloading and installing the Remote R service
wget -O rtvs-daemon.tar.gz https://aka.ms/r-remote-services-linux-binary-current
tar -xvzf rtvs-daemon.tar.gz
sudo ./rtvs-install -s
sudo systemctl enable rtvsd
sudo systemctl start rtvsd
#sudo rm rtvs-daemon.tar.gz
#sudo rm rtvs-install
#Fixing Remote R: For some reason, even though 'sudo systemctl enable rtvsd' runs, after every reboot the service won't become automatically active. So let's fix that.
wget https://sa0im0general.blob.core.windows.net/general-blob-container/StartRemoteRAfterReboot.sh
sudo mv StartRemoteRAfterReboot.sh /var/RevoShare/StartRemoteRAfterReboot.sh
sudo /sbin/shutdown -r 5
sudo chown root /etc/rc.local
sudo chmod 755 /etc/rc.local
sudo systemctl enable rc-local.service
sudo -s
sudo find /etc/ -name "rc.local" -exec sed -i 's/exit 0//g' {} \;
sudo echo "" >> /etc/rc.local
sudo echo "sh /var/RevoShare/StartRemoteRAfterReboot.sh" >> /etc/rc.local
sudo echo "exit 0" >> /etc/rc.local
exit
I've also tried, one by one, these, to see if it makes any difference to the RStudio Server (it didn't, but even if it did, I want a global solution to work on Remote R Workspace Service and R Server Operationalisation as well, not only RStudio Server):
#Configuring RStudio Server to see the R Server R
sudo echo "rsession-which-r=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> /etc/rstudio/rserver.conf
export RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R
sudo echo "RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> ~/.profile
source ~/.profile
sudo echo "RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> ~/.bashrc
source ~/.bashrc
sudo echo "PATH=$PATH:/opt/microsoft/mlserver/9.2.1/bin/R" >> ~/.bashrc
export PATH=$PATH:/opt/microsoft/mlserver/9.2.1/bin/R
source ~/.bashrc
The problem is that even though "which R" points to R Server's R, i.e. typing "sudo R" will show the message "Loading Microsoft R Server packages, version 9.2.1." and will load packages like RevoScaleR, everything else fails to do so.
Accessing the RStudio Server with http://THE-IP-GOES-HERE.westeurope.cloudapp.azure.com:8787 and logging in with the initial user ("sshuser") (or with any other user for that matter) will NOT load R Server and RevoScaleR rx functions are unavailable
Using my local Visual Studio 2017 to access the remote workspace via "Add connection" on "Workspaces" tab loads MRO and says:
Installed R versions:
[0] Microsoft R Open '3.4.1.1347' (Default)
And finally, when I use R Server's Operationalisation and log in with "mrsdeploy" package's "remoteLogin()" R Server packages like RevoScaleR are not loaded again, so things like "rxSummary(~., data=iris)" fail with error 'could not find function "rxSummary"'
The exact same thing happened when I deployed from azure a "Machine Learning Server 9.2.1 on Linux (Ubuntu)".
I don't want to just use the regular open source R, I want to be able to use the R Server - that's why I deployed this VM. How can I make it so that everything loads R Server's R, not Microsoft R Open? (Like I'm able to do from terminal using "R")
As a result of my having tried all of this and the fact that R Server is loaded in the console, my mind now goes to permissions. Could it be that by default the Data Science VM doesn't have the correct permissions to allow these?
I'm at a loss
RStudio Server is installed on the Ubuntu DSVM, but the service is disabled by default as it does not support SSL. You can enable it with systemctl enable rstudio-server, then start it with systemctl start rstudio-server.
RStudio Server uses the same R as Microsoft R Server, but the .libPaths are different, which is why you cannot load the MRS packages. You will need to manually set the .libPaths so they match.

Resources