Airflow on Docker: How to update git version from 2.20 to atleast 2.25 when creating a DBT package - airflow

I set up an airflow environment (on Docker platform) an year back. I am not in a position to upgrade to latest airflow version.
Customer is trying to create a dbt package (essentially a git repo) and link to it within dbt.
Error message: Please update your git version to pull a dbt package from a sub directory: your version is 2.20.1, >= 2.25.0 needed.
How to upgrade git version?
I can share my Dockerfile
FROM apache/airflow:2.2.4-python3.9
RUN pip install "apache-airflow-providers-microsoft-mssql" "apache-airflow-providers-snowflake" "authlib" "Flask-OpenID"
USER root
RUN apt-get update
&& apt-get install -y git libpq-dev python-dev python3-pip
&& apt-get remove python-cffi
RUN pip install --upgrade cffi
RUN pip install cryptography~=3.4
USER airflow
RUN pip install airflow-dbt-python
RUN pip install dbt-sqlserver==1.0.0
RUN pip install dbt-snowflake==1.0.0
RUN dbt --version

Related

Missing control panel icons

After installing pip install jupyterlab_scheduler, these icons disappeared:
enter image description here
How to return?
I ran jupiter with dockerfile
FROM jupyter/scipy-notebook:latest
# Switch to root user to install cron
USER root
RUN apt-get update && apt-get upgrade -y && apt-get install --yes cron
# Switch back to Jupyter user
USER $NB_USER
RUN fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER
# Set USER to the env variable jupyter uses for the default account
ENV USER=$NB_USER
RUN pip install jupyterlab_scheduler
RUN jupyter labextension install jupyterlab_scheduler #jupyter lab build

Installing R 4.1 from source on newly installed Ubuntu 20.04

I tried to install R 4.1 on a newly set-up Ubuntu 20.04. After some struggle with the repositories and keys, I chose to install from source.
Visited https://www.r-project.org/ to download latest version of R. Ran ./configure multiple times to note the requirement of various libraries which you may note as follows. I am hoping this will save significant amount of time for anyone intending to build from source. Suggestions welcome for speeding up the process or better solution. However my intention is to share entire set of libraries that I had to install on a naked installation of 20.04 on which the very first package I tried to install was R 4.1 (from source).
Directory where you downloaded the tar.gz (e.g. R-4.1.2.tar.gz in my case)
cd Downloads
Untar
tar -xvzf R-4.1.2.tar.gz
Enter directory
cd R-4.1.2
As root
sudo su
Try validating configuration (encountered many errors throughout and following libraries were installed)
./configure
Installed various libraries:
apt-get install build-essential
apt-get install gfortran
apt-get install fort77
apt-get install libreadline-dev
apt-get install xorg-dev
apt-get install liblzma-dev libblas-dev
apt-get install gcc-multilib
apt-get install libbz2-dev
apt-get install libpcre2-dev
apt-get install libcurl4-openssl-dev
apt install default-jdk
make
make install
The above set worked for me and I am hopeful it may be of help.

launching R in AWS EC2

I am trying to launch an R instance in AWS EC2. I have opted for the free tier and use the Amazon Linux AMI. In user data I have specified in the following manner to install R and Rstudio:
#!/bin/bash
# install R
yum install -y R
# install RStudio-Server
wget https://download2.rstudio.org/server/centos6/x86_64/rstudio-server-rhel-1.2.5033-
x86_64.rpm
yum install -y --nogpgcheck rstudio-server-rhel-1.2.5033-x86_64.rpm
yum install -y curl-devel
yum install -y openssl openssl-devel
# add user
useradd forecasting
echo forecasting:testing | chpasswd
However, the R version is not the latest one - how do I modify this code to download the latest version of R?
Using the instructions in this installing R for RStudio link, I think the following might work for Centos 6. Unfortunately I can't check it, but it might give you something to work from.
#!/bin/bash
# install R runtime dependencies
yum install epel-release
# set R version
R_VERSION=3.6.2
# download and install R
wget https://cdn.rstudio.com/r/centos-6/pkgs/R-${R_VERSION}-1-1.x86_64.rpm
yum install -y R-${R_VERSION}-1-1.x86_64.rpm
# install RStudio-Server
wget https://download2.rstudio.org/server/centos6/x86_64/rstudio-server-rhel-1.2.5033-
x86_64.rpm
yum install -y --nogpgcheck rstudio-server-rhel-1.2.5033-x86_64.rpm
yum install -y curl-devel
yum install -y openssl openssl-devel
# add user
useradd forecasting
echo forecasting:testing | chpasswd

Error while install airflow: By default one of Airflow's dependencies installs a GPL

Getting the following error after running pip install airflow[postgres] command:
> raise RuntimeError("By default one of Airflow's dependencies installs
> a GPL "
>
> RuntimeError: By default one of Airflow's dependencies installs a GPL
> dependency (unidecode). To avoid this dependency set
> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when you install
> or upgrade Airflow. To force installing the GPL version set
> AIRFLOW_GPL_UNIDECODE
I am trying to install in Debian 9
Try the following:
export AIRFLOW_GPL_UNIDECODE=yes
OR
export SLUGIFY_USES_TEXT_UNIDECODE=yes
Using export makes the environment variable available to all the subprocesses.
Also, make sure you are using pip install apache-airflow[postgres] and not pip install airflow[postgres]
Which should you use: if using AIRFLOW_GPL_UNIDECODE, airflow will install a dependency that is under GPL license, which means you won't be able to distribute your resulting application commercially. If that's a problem for you, go for SLUGIFY_USES_TEXT_UNIDECODE.
If you are installing using sudo run one of these commands:
sudo AIRFLOW_GPL_UNIDECODE=yes pip3 install apache-airflow
OR
sudo SLUGIFY_USES_TEXT_UNIDECODE=yes pip3 install apache-airflow
NOTE: If pip3 (python3) does not work for you, try pip command.
The pip command can be pointing to python2 or python3 installation depending on your system. Verify this by running pip --version.
Windows users can use the command below before installing apache-airflow:
$ set AIRFLOW_GPL_UNIDECODE=yes
then
$ pip install apache-airflow
In case you are installing the airflow on Windows and through Python terminal then you need to write this:
Set SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow[postgres]
It worked with me after I struggled with trying many other options. Hope this will work with you too.
Below command should install apache-airflow and lets you pull changes into PyCharm for building DAGs and coding for Airflow.
SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow
Also, if you are installing using sudo you can use:
export AIRFLOW_GPL_UNIDECODE='yes'
sudo -E pip3 install apache-airflow
(or use SLUGIFY_USES_TEXT_UNIDECODE)
Run the following command in your python terminal: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install apache-airflow==1.10.0
Use below command to install apache-airflow
sudo SLUGIFY_USES_TEXT_UNIDECODE=yes \
pip install apache-airflow[async,devel,celery,crypto,druid,gcp_api,jdbc,hdfs,hive,kerberos,ldap,password,postgres,qds,rabbitmq,s3,samba,slack]

Make apt install packages from a specific repository

Is there any way to have apt install a package from a specific launchpad repository?
I would like to set up a little test server and install all of the 1000+ r-cran-* packages from the cran2deb4ubuntu launchpad repository. As of last month, all packages in this repository are build for R 3.0.1. So I install a copy of R 3.0.1 and then do:
sudo add-apt-repository marutter/c2d4u
sudo apt-get update
sudo apt-get install r-cran-*
However, this will also install all of the r-cran-* packages form universe which are build for R 2.15, and hence will fail to load. Is there an easy way to install the packages only from c2d4u? Or alternatively, is there a way to blacklist the r-cran- packages in universe from apt?
What I ended up doing is simply install all packages and then remove the ones with an old build. I.e.
sudo add-apt-repository ppa:marutter/c2d4u -y
sudo add-apt-repository ppa:marutter/rrutter -y
sudo apt-get update
sudo apt-get install r-bioc-*
sudo apt-get install r-cran-
And then in R:
which(installed.packages()[,"Built"] < 3.0)

Resources