Requirement: To connect and execute the RScript with connection from R to SnowflakeDB
I am trying to set up a Docker image that can communicate with a Snowflake database through R (either using RODBC or ODBC)
Error:
The problem seems to be that It fails to install (or locate) the necessary Snowflake Drivers when establishing a connection
Error: nanodbc/nanodbc.cpp:1021: 00000: [unixODBC][Driver Manager]Can't open lib 'SnowflakeDSIIDriver': file not found
Gave connection details below
# Load required libraries
library(paws)
library(DBI)
library(odbc)
library(anomalize)
con <- DBI::dbConnect(
odbc::odbc(),
Driver = "SnowflakeDSIIDriver",
Server = "account.snowflakecomputing.com",
Database = "DEV",
Schema = "SCHEMA",
Warehouse = "WH_XS",
UID = username,
PWD = password
)
Here is my current template DockerFile
ARG AIRFLOW_VERSION=2.3.3
ARG PYTHON_RUNTIME_VERSION=3.8
FROM apache/airflow:${AIRFLOW_VERSION}-python${PYTHON_RUNTIME_VERSION}
SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
USER 0
RUN sudo apt-get update && apt-get install -y r-base r-base-core r-base-dev libssl-dev libcurl4-openssl-dev libgdal-dev && \
rm -r /var/lib/apt/lists/*
RUN R -e "install.packages(c('httr','shiny','jsonlite','data.table','forecast','anomalize','tibbletime','DBI','dplyr','dbplyr','odbc','ggplot2','DT','getip','shinyTime','paws','RODBC'), repos = 'https://cran.rstudio.com/')"
Your Dockerfile template looks good, it only needs to have the driver installation steps. Try this:
ARG AIRFLOW_VERSION=2.3.3
ARG PYTHON_RUNTIME_VERSION=3.8
FROM apache/airflow:${AIRFLOW_VERSION}-python${PYTHON_RUNTIME_VERSION}
SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
USER 0
RUN sudo apt-get update && apt-get install -y r-base r-base-core r-base-dev libssl-dev libcurl4-openssl-dev libgdal-dev && \
rm -r /var/lib/apt/lists/*
RUN R -e "install.packages(c('httr','shiny','jsonlite','data.table','forecast','anomalize','tibbletime','DBI','dplyr','dbplyr','odbc','ggplot2','DT','getip','shinyTime','paws','RODBC'), repos = 'https://cran.rstudio.com/')"
RUN curl -L -O https://sfc-repo.snowflakecomputing.com/odbc/linux/2.25.7/snowflake-odbc-2.25.7.x86_64.deb
RUN dpkg -i snowflake-odbc-2.25.7.x86_64.deb
ENTRYPOINT ["/bin/bash"]%
In the example above, I hardcoded a link to download the last version of the driver to make the example more visual, but you can also replace it with a variable.
Additionally, in your connection string, the UID and PWD parameters should be in small letters, like uid and pwd, see below my example:
library(DBI)
library(odbc)
con <- DBI::dbConnect(
odbc::odbc(),
Driver = "SnowflakeDSIIDriver",
Server = "account.snowflakecomputing.com",
Database = "DEV",
Schema = "SCHEMA",
Warehouse = "WH_XS",
uid = "USERNAME",
pwd = 'password'
)
mydata <- DBI::dbGetQuery(con,"SELECT CURRENT_TIMESTAMP()")
mydata
I have the following dockerfile:
FROM rocker/tidyverse:3.5.2
RUN apt-get update
# System dependices for R packages
RUN apt-get install -y \
git \
make \
curl \
libcurl4-openssl-dev \
libssl-dev \
pandoc \
libxml2-dev \
unixodbc \
libsodium-dev \
tzdata
# Clean up package installations
RUN apt-get clean
# ODBC system dependencies
RUN apt-get install -y gnupg apt-transport-https
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install msodbcsql17 -y
# Install renv (package management)
ENV RENV_VERSION 0.11.0
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "remotes::install_github('rstudio/renv#${RENV_VERSION}')"
# Specify USER for rstudio session
ENV USER rstudio
COPY ./renv.lock /renv/tmp/renv.lock
WORKDIR /renv/tmp
RUN R -e 'renv::consent(provided = TRUE)'
RUN R -e "renv::restore()"
WORKDIR /home/$USER
I use this image to recreate environments for R scripting purposes. This was working for a number of months up until the end of September when I started getting:
Error in curl::curl_fetch_memory(url, handle = handle) :
SSL certificate problem: certificate has expired
This occured when using GET request to query a website. How do I update my certificate now and in the future to avoid certificates expiring...I do not want to use the "config(ssl_verifypeer = FALSE)" workaround.
This happened to me too. Any chance you are working on MacOS or Linux based machine? It seems to be a bug:
https://security.stackexchange.com/questions/232445/https-connection-to-specific-sites-fail-with-curl-on-macos
ca-certificates Mac OS X
SSL Certificates - OS X Mavericks
Instead of adjusting the certificates as is suggested here, you can simply set R to not verify the peer. Just add the following line to the beginning of your code.
httr::set_config(config(ssl_verifypeer = FALSE, ssl_verifyhost = FALSE))
I am deploying simple plumber API in Azure container which executes simple SQL query and returns results. Here is my Dockerfile.
# start from the rocker/r-ver:4.0.2 image
# minor change to one branch to test deployment
FROM rocker/r-ver:4.0.2
# install the linux libraries needed for plumber
RUN apt-get update -qq && apt-get install -y \
libssl-dev \
libcurl4-gnutls-dev \
libsodium-dev \
unixodbc-dev \
r-cran-rodbc \
curl\
libxml2-dev
# libiodbc2-dev
# We need to update Dockerfile to install ODBC Driver 17 for SQL Server for
# Ubuntu 20.04 or different version
# here is link: https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver15#ubuntu17
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install -y msodbcsql17
RUN ACCEPT_EULA=Y apt-get install -y mssql-tools
RUN echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
RUN echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
# RUN source /etc/bash.bashrc
RUN mkdir -p ~/application
# copy everything from the current directory into the container
COPY "/" "application/"
# I probably do not even have to set up WORKDIR
WORKDIR "application/"
# open port 80 to traffic
EXPOSE 80
# install plumber
RUN R -e "install.packages('plumber')"
RUN R -e "install.packages('renv')"
RUN R -e "install.packages('RODBC')"
RUN R -e 'renv::restore()'
# when the container starts, start the main.R script
ENTRYPOINT ["Rscript", "execute_plumber.R"]
API works fine. However, when I connect to container, run R and execute library(RODBC) I get back error about library not found. This is puzzling since it is already mentioned in Dockerfile, but maybe it has something to do with renv. I proceed with installation install.packages('RODBC') and it generates error which I can fix by running apt-get install unixodbc-dev and apt-get install r-cran-rodbc. Those lines are again already present in Dockerfile. Then when I try to connect to SQL server I get error about driver. And again I can fix it by re-executing lines from Dockerfile (related to SQL Server 17 driver).
So I am puzzled why I need to re-execute some items already present in Dockerfile. Or maybe when I click "Connect" in Azure it actually connects me to host and not to container?
I am trying to create a dockerfile that builds an image from Rocker/tidyverse and include Spark from sparklyr. Previously, on this post: Unable to install spark with sparklyr in Dockerfile, I was trying to figure out why spark wouldn't download from my dockerfile. After playing with it for the past 5 days I think I have found the reason but have no idea how to fix it.
Here is my Dockerfile:
# start with the most up-to-date tidyverse image as the base image
FROM rocker/tidyverse:latest
# install openjdk 8 (Java)
RUN apt-get update \
&& apt-get install -y openjdk-8-jdk
# Install devtools
RUN Rscript -e 'install.packages("devtools")'
# Install sparklyr
RUN Rscript -e 'devtools::install_version("sparklyr", version = "1.5.2", dependencies = TRUE)'
# Install spark
RUN Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'
RUN mv /root/spark /opt/ && \
chown -R rstudio:rstudio /opt/spark/ && \
ln -s /opt/spark/ /home/rstudio/
RUN apt-get install unixodbc unixodbc-dev --install-suggests
RUN apt-get install odbc-postgresql
RUN install2.r --error --deps TRUE DBI
RUN install2.r --error --deps TRUE RPostgres
RUN install2.r --error --deps TRUE dbplyr
It has no problem downloading everything up until this line:
RUN Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'
Which then gives me the error:
Step 5/11 : RUN Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'
---> Running in 739775db8f12
Error in download.file(installInfo$packageRemotePath, destfile = installInfo$packageLocalPath, :
download from 'https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz' failed
Calls: <Anonymous>
Execution halted
ERROR: Service 'rocker_sparklyr' failed to build : The command '/bin/sh -c Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'' returned a non-zero code: 1
After doing some research I thought that it was a timeout error, in which case I ran beforehand:
RUN Rscript -e 'options(timeout=600)'
This did not increase the time it took to error out again. I installed everything onto my personal machine through Rstudio and it installed with no problems. I think the problem is specific to docker in that it isn't able to download from https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
I have found very little documentation on this problem and am relying heavily on this post to figure it out. Thank you in advance to anyone with this knowledge for reaching out.
download the version yourself and then use this function to install
sparklyr::spark_install_tar(tarfile ="~/spark/spark-3.0.1-bin-hadoop3.2.tgz")
I'm trying to access a SQL Server on a local intranet with R, where R is loaded in a docker file with R-base.
On my desktop, I create a ODBC driver and it recognizes my windows credentials with the following:
odbcDriverConnect('driver={SQL Server};server=<serverName>;database=<dbName>;trusted_connection=yes')
Using R-base, after a lot of trial and error I was able to get RODBC installed, but I have no idea how to connect to this server.
Warning messages:
[RODBC] ERROR: state 01000, code 0, message [unixODBC][Driver Manager]Can't open lib 'SQL Server' : file not found
The DockerFile, which was built with a lot of trial and error. I'm sure much of it is redundant. Also I exposed ports which I found doing some research but I have no idea if they are needed to be exposed.
FROM openanalytics/r-base
# system libraries of general use
RUN apt-get update && apt-get install -y \
sudo \
libcairo2-dev \
libxt-dev \
libcurl4-gnutls-dev \
libssl-dev \
libssh2-1-dev \
libssl1.0.0
# RODBC
RUN apt-get update && \
apt-get install -y --no-install-recommends \
unixodbc \
unixodbc-dev \
unixodbc \
unixodbc-dev \
r-cran-rodbc \
apt-transport-https \
libssl-dev \
libsasl2-dev \
openssl \
curl \
unixodbc \
gnupg \
libc6 libc6-dev libc6-dbg
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
#RUN curl https://packages.microsoft.com/config/ubuntu/19.10/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update -y
RUN ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev mssql-tools
EXPOSE 3838
EXPOSE 8787
Once I build the file, I enter R studio and use install.packages("RODBC")
How can I connect to the SQL server from the docker file?
----------Update
after creating a docker container with Ubuntu, I found that I was not able to ping the server name. I was however, able to ping the server ip address.
I modified the search string to ping the ip address, and changed the driver, as follows:
odbcDriverConnect('driver={ODBC Driver 17 for SQL Server};server=<serverName>;database=<dbName>;trusted_connection=yes')
This still did not work.
I could not figure out how to connect to the database with the Windows credentials of the docker host inside of the docker virtual machine.
I had to make a username and password for the database and connect with that.
I think if you change your driver to FreeTDS this will work. R doesn't support Sql Server drivers unless its a pro plan.