How to install ODBC Driver 17 for SQL Server on a Azure Databricks cluster with no internet access - odbc

My goal is to access Azure Synapse Analytics from Azure Databricks. The first thing that came in mind is to use the spark driver com.databricks.spark.sqldw. But for that, the database user needs to be db_owner in the database, which is not suitable, since users could mess around with Synapse. I just want users to read data from Synapse using their own Active Directory accounts.
My second shot then is to try using ODBC driver (or JDBC) to access Synapse as we normally do in local Python scripts. The problem is that our Databricks clusters have no internet access, so we can't just run apt-get like commands (in order to install the ODBC drivers).
So, any of these questions may help me to solve the problem:
How do I copy a file from Azure Storage Account Gen2 to the local databricks cluster file system? I put the ODBC Driver 17 for SQL Server (msodbcsql17_17.10.1.1-1_amd64.deb) in a Container in Storage Account and I can see it using dbutils. But, can I copy that file to the databricks cluster filesystem?
Is it possible to use the default spark driver com.databricks.spark.sqldw for accessing Azure Synapse Analytics with SELECT permission only?
For anyone wondering why I'm not using the Synapse Apache Spark Pool is because I can't run queries like SELECT * FROM A INNER JOIN B.... And of couse, the Databricks UI is much better ; )
Thanks for any help.

Generically answering to my own question of "how do I access a file from Azure Storage Acount Gen2 using the Databricks cluster filesystem" is quite simple. Just had to mount it using dbutils and the mount will automatically appears in /dbfs/mnt. For instance (python script):
dbutils.fs.mount(
source = "wasbs://<container>#<storage>.blob.core.windows.net"
, mount_point = "/mnt/my_adlsgen2"
, extra_configs = {"fs.azure.account.key.<storage>.blob.core.windows.net":"account key"})
and the mount will be at:
/dbfs/mnt/my_adlsgen2
Finally, you can go to the above folder and list files or do whatever you want with it.
In order to install the ODBC Driver that is in the ADLS Gen2, I had to open a new cell and run:
%sh
echo msodbcsql17 msodbcsql/ACCEPT_EULA boolean true | sudo debconf-set-selections
sudo dpkg -i /dbfs/mnt/my_adlsgen2/msodbcsql17_17.10.1.1-1_amd64.deb
And checking if it installed ok:
%sh
odbcinst -j
unixODBC 2.3.6
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
FILE DATA SOURCES..: /etc/ODBCDataSources
USER DATA SOURCES..: /root/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8
and lastly:
%sh cat /etc/odbcinst.ini
[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.1.1
UsageCount=1

Related

MariaDB Galera Cluster: issue with replication

Here is my setup:
4 VMs (running on CentOS 7)
VM1 with mariadb-client and maxscale for load balancing (I have tried haproxy, results are the same). httpd and php (I am testing this with WordPress installation)
VM2, VM3, VM4 with mariadb-server, galera, rsync
Software installation
adding repository "curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash" on all 4 VMs
installing MariaDB-server on VM2, VM3, VM4 (this includes galera and all the required software)
installing maxscale and MariaDB-client on VM1
Editing config files
on VM2, VM3, VM4 I have added:
https://gist.github.com/yarko686/5adb7b24784c4c3c24a526519623d930
to /etc/my.cnf.d/server.cnf
on VM1 I have added the following lines to /etc/maxscale.cnf https://gist.github.com/a67e94afaa4ecc57ccb985d897ee3e87.git
Staring the cluster
on VM2 I have executed galera_new_cluster
on VM3 and VM4 I have executed systemctl start mariadb
Checking the cluster
on VM2 I am accessing mysql using mysql -u root then executing:
show global status like 'wsrep_cluster_size';
I receive this output https://gist.github.com/yarko686/a63c925b3275d239f38d50f0651e45ef it means that there are 3 machines in cluster
Creating maxscale user and wordpress users
Login to MySQL CLI on VM2 using mysql -u root and executing the following commands
https://gist.github.com/yarko686/950ea62f79638a6f293c28b99dd19f7b
for WordPress user I use the same commands, except .. I these cases, I'm using wordpress_db.* instead.
The main issue.
after importing WordPress database, it is properly created only on VM2 only. On VM3 and VM4 the database and tables are created, however, for some reason they are empty.
If I access wordpress database through MySQL CLI using my wordpress user and create new table with some data it gets replicated, but when I add user to my wp_users table (or add user through wp-admin) it is not replicated. The record gets created only on VM2 and not on VM3 and VM4.
check to see if the tables are innodb instead of isam.
I know on my setup when I imported old isam tables, the tables would appear, but the data wouldn't replicate. I had to convert all of the tables to innodb.

Permission Issue with Docker Volume Driver for Azure File Storage

I am following the readme for this project (https://github.com/Azure/azurefile-dockervolumedriver/blob/master/contrib/init/upstart/README.md), but when I try and mount a volume on a container like this
docker volume create -d azurefile -o share=myshare --name=myvol
docker run -i -t -v myvol:/data busybox
(inside the container)
# cd /data
# touch file.txt
I get this error:
Error response from daemon: VolumeDriver.Mount: mount failed: exit status 32
output="mount.cifs kernel mount options: ip=168.61.57.82,unc=\\\\cmstoragecd.file.core.windows.net\\myshare,vers=3.0,dir_mode=0777,file_mode=0777,user=cmstoragecd,pass=********\nmount
error(13): Permission denied\nRefer to the mount.cifs(8) manual page (e.g. man mount.cifs)\n"
This is running on an Ubuntu 14.04 server on Azure. I have successfully used the extension with similiar servers, but it is now not working. What can I do to debug this?
your answer is correct. CIFS in many Linux distros currently do not have encryption support ––which Azure File Storage requires in cross-region SMB traffic.
Quoting the note at https://azure.microsoft.com/en-us/documentation/articles/storage-how-to-use-files-linux/
Note: The Linux SMB client doesn’t yet support encryption, so mounting a file share from Linux still requires that the client be in the same Azure region as the file share. However, encryption support for Linux is on the roadmap of Linux developers responsible for SMB functionality. Linux distributions that support encryption in the future will be able to mount an Azure File share from anywhere as well.
In the future, please consider directly contacting to us by opening a new issue on our GitHub repository at: https://github.com/Azure/azurefile-dockervolumedriver/issues.
I managed to get around this error by using a storage account in the same region as the Azure VM. Originally I had a VM running in West Europe, using a file share in East US.

Oracle 11g XE installation on docker RHEL 7 image

While installing oracle 11g XE on docker i am getting the error.
Following are the output:-
/etc/init.d/oracle-xe configure
Oracle Database 11g Express Edition Configuration
This will configure on-boot properties of Oracle Database 11g Express
Edition. The following questions will determine whether the database should
be starting upon system boot, the ports it will use, and the passwords that
will be used for database accounts. Press to accept the defaults.
Ctrl-C will abort.
Specify the HTTP port that will be used for Oracle Application Express [8080]:8080
Specify a port that will be used for the database listener [1521]:1521
Specify a password to be used for database accounts. Note that the same
password will be used for SYS and SYSTEM. Oracle recommends the use of
different passwords for each database account. This can be done after
initial configuration:
Confirm the password:
Do you want Oracle Database 11g Express Edition to be started on boot (y/n) [y]:y
Starting Oracle Net Listener...Done
Configuring database...
Database Configuration failed. Look into /u01/app/oracle/product/11.2.0/xe/config/log for details
[root#b7c63c4e1da8 Disk1]# cd /u01/app/oracle/product/11.2.0/xe/config/log
[root#b7c63c4e1da8 log]# ls
CloneRmanRestore.log cloneDBCreation.log postDBCreation.log postScripts.log
[root#b7c63c4e1da8 log]# cat CloneRmanRestore.log
ORA-00845: MEMORY_TARGET not supported on this system
select TO_CHAR(systimestamp,'YYYYMMDD HH:MI:SS') from dual
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0
declare
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0
select TO_CHAR(systimestamp,'YYYYMMDD HH:MI:SS') from dual
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0
One of the possible solution that I got was to mount the temp file to provide extra space to it which only contains 6GB approx in the docker. But i am unable to mount the memory in docker.
Got the solution for the same :-
we have to modify the files init.ora and initXETemp.ora at the path /u01/app/oracle/product/11.2.0/xe/config/scripts
with the values :-
###########################################
# Miscellaneous
###########################################
compatible=11.2.0.0.0
diagnostic_dest=/u01/app/oracle
#memory_target=1073741824
pga_aggregate_target=200540160
sga_target=601620480
You may encounter
ORA-00845: MEMORY_TARGET not supported on this system
when starting Oracle DB in an unprivileged container. Try running the container with the --privileged flag, e.g.
docker run --name oracle12 --hostname oracledb --privileged local/oracle12:12.1.0.2

Ubuntu Shiny server connecting to Jet/ACE databases

Can it be done: Reading data stored in an MS Access (.accdb) database, from within Shiny apps running on Ubuntu Shiny server?
We have no knowledge of SQL Server Express. We have our data organized in simple MS Access databases, and want to deploy our Shiny apps (who visualize this data) on an Ubuntu Shiny server.
It all works on our local Windows machines, but how to make it also work with an Ubuntu Shiny server?
I understand that with our minimal knowledge of database systems, it is not straightforward to go porting our databases to SQL Server Express.
Thanks in advance for your expertise!
I had a bit of a job setting this up myself. I had to take info from several sources to get all the required packages – the following is a list of good info sources:
http://guywyant.info/log/206/connecting-to-ms-sql-server-from-ubuntu/
http://driftharmony.wordpress.com/2008/08/15/connecting-ubuntu-804-to-microsoft-sql-server/
https://code.google.com/p/django-pyodbc/wiki/FreeTDS
FreeTDS working, but ODBC cannot connect
The 3 files were ultimately configured thus:
Detail of file: /etc/odbc.ini
[NameThis]
Driver = FreeTDS
TDS_Version=8.0
Servername = YourServer
Port = 1433
Database = testing
Trace = No
Detail of file: /etc/odbcinst.ini
[FreeTDS]
Description = FreeTDS
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Detail of file: /etc/freetds/freetds.conf
# $Id: freetds.conf,v 1.12 2007/12/25 06:02:36 jklowden Exp $
# This file is installed by FreeTDS if no file by the same name is found in the installation directory.
# For information about the layout of this file and its settings, see the freetds.conf manpage "man freetds.conf".
# Global settings are overridden by those in a database server specific section
[global]
# TDS protocol version
; tds version = 4.2
# Whether to write a TDSDUMP file for diagnostic purposes
# (setting this to /tmp is insecure on a multi-user system)
; dump file = /tmp/freetds.log
; debug flags = 0xffff
# Command and connection timeouts
; timeout = 10
; connect timeout = 10
# If you get out-of-memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field. Try setting 'text size' to a more reasonable limit
text size = 64512
# Test Kx
[NameThis]
host = YOUR IP
port = 1433
tds version = 7.2

Restore Postgres Database from pg_data directory

I have a broken VM that won't boot with an old postgresql database on it (used to run PostgreSQL 8.4). I have access to the file system (and the pg_data directory).
How can I extract the data (or restore the database) from this data directory?
Is it as simple as copying the contents of this directory into a working 8.4 pg_data directory?
Actually it is basically that simple. Here are the steps I took to get this working:
1) Archive the data directory (/var/lib/postgres/8.4/data) into a tar.gz file.
2) Move the file to a working workstation (my desktop, running a Debian-based distribution of Linux)
3) Install the PostgreSQL APT repository and install postgresql-8.4 (or version which was on the broken server) using the instructions found at the PostgreSQL Linux downloads for Ubuntu.
4) Extract the contents of the tar.gz file into the main directory for the "new" PostgreSQL 8.4 installation (/var/lib/postgresql/8.4/main/).
5) Modify the postgresql.conf to change the port = 5432 to port = 5433. This allows us to control which version of PostgreSQL we connect to using the port number (assuming we have the latest stable version on our workstation, such as 9.1). So 9.1 will stay on the default 5432, and 8.4 will be on 5433.
6) Modify the ownership of the extracted data directory so postgres is the owner: chown -R postgres:postgres /var/lib/postgresql/8.4/main/*
7) Start the postgres service: service postgresql start (you'll see both versions start up)
8) su as postgres and connect using port 5433, and the database name that was on the old server: psql -p 5433 DatabaseName

Resources